-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
CategoricalCategorical Data TypeCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversions
Milestone
Description
When creating a pandas Series/Index/DataFrame, I think we generally differentiate between passing a pandas object with object
dtype and a numpy array with object
dtype:
>>> pd.options.future.infer_string = True
>>> pd.Index(pd.Series(["foo", "bar", "baz"], dtype="object"))
Index(['foo', 'bar', 'baz'], dtype='object')
>>> pd.Index(np.array(["foo", "bar", "baz"], dtype="object"))
Index(['foo', 'bar', 'baz'], dtype='str')
So for pandas objects, we preserve the dtype, for numpy arrays of object dtype, we essentially treat that as a sequence of python objects where we infer the dtype (@jbrockmendel that's also your understanding?)
But for categorical that doesn't seem to happen:
>>> pd.options.future.infer_string = True
>>> pd.Categorical(pd.Series(["foo", "bar", "baz"], dtype="object"))
['foo', 'bar', 'baz']
Categories (3, str): [bar, baz, foo] # <--- categories inferred as str
So we want to preserver the dtype for the categories here as well?
Metadata
Metadata
Assignees
Labels
CategoricalCategorical Data TypeCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversions