-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
DOC Fix default values, shape and dtype for preprocessing._discretization.py #18290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Fix default values, shape and dtype for preprocessing._discretization.py #18290
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @franslarsson for your PR!
I have a few suggestions.
@@ -63,11 +63,11 @@ class KBinsDiscretizer(TransformerMixin, BaseEstimator): | |||
|
|||
Attributes | |||
---------- | |||
n_bins_ : int array, shape (n_features,) | |||
n_bins_ : ndarray of shape (n_features,), dtype=np.int_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why np.int_
is used here instead of np.int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find np.int
to be a valid dtype in https://numpy.org/doc/stable/user/basics.types.html (similar to the comment in #18025 (comment)). But as I understand it np.int_ is platform dependent and will be either np.int32 or np.int64. Therefore, I thought I could be a good dtype to use here. Please correct me if I'm wrong or misunderstood something. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say dtype={np.int32, np.int64}
to avoid platform-dependent definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.int
would be fine with me:
In [1]: import numpy as np
In [2]: np.int
Out[2]: int
In [3]: type(np.int)
Out[3]: type
This is a valid type and would be platform-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue with np.int is that it's deprecated https://numpy.org/devdocs/release/1.20.0-notes.html#using-the-aliases-of-builtin-types-like-np-int-is-deprecated
Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'm not sure we should indicate the dtypes everywhere, particularly where the input can be of a variety of dtypes which can be cast to float64 as long as check_array
works.
…tion.py (scikit-learn#18290) Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>
Reference Issues/PRs
See #15761
What does this implement/fix? Explain your changes.
This PR change how default values are documented in sklearn.preprocessing._discretization.py and update dtype in docstring.
Any other comments?