-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
API DesignNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action
Description
This was already implemented before 2.0 in #50748, but then removed before the release in #51853, as in too many cases the option wasn't being respected.
The idea is to have a global option to let pandas know which dtype kind to use when data is created (the exact option name needs to be discussed, but I'll use use_arrow
to illustrate):
pandas.options.mode.use_arrow = True
df = pandas.read_csv(...) # The returned DataFrame will use pyarrow dtypes
df["foo"] = 1 # The added column will use pyarrow dtypes
df = pandas.DataFrame(...) # The returned DataFrame will use pyarrow dtypes
...
I don't think adding the option is controversial, as it has no impact on users unless set, and it was already implemented without objections in the past.
I think the implementation requires a bit of discussion, as the exact behavior to implement is not immediately obvious, a least to me. Main points I can see
- Should we have an option to set pyarrow as the default (since those should be the types we expect people to use in the future), or a more generic option to set
dtype_backend
tonumpy|nullable|pyarrow
? - I think at least initially it makes sense that if a user is specific about the dtype they want to use (e.g.
Series([1, 2], dtype="Int32")
) we let them do it. But could it make sense to have a second optionforce_arrow
orforce_dtype_backend
so any operation that would use another dtype kind would fail? I think this could be helpful for users that only want to live in the pyarrow world, and it would also be helpful to identify undesired casts for us. - The exact namespace (
mode
vsfuture
vs others) and name of the option, which clearly will depend on the previous points
Metadata
Metadata
Assignees
Labels
API DesignNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action