Skip to content

DOC Fix default values, shape and dtype for preprocessing._discretization.py #18290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 31, 2020

Conversation

franslarsson
Copy link
Contributor

Reference Issues/PRs

See #15761

What does this implement/fix? Explain your changes.

This PR change how default values are documented in sklearn.preprocessing._discretization.py and update dtype in docstring.

Any other comments?

@franslarsson franslarsson changed the title DOC Fix default values, shape and dtype DOC Fix default values, shape and dtype for preprocessing._discretization.py Aug 29, 2020
Copy link
Member

@alfaro96 alfaro96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @franslarsson for your PR!

I have a few suggestions.

@@ -63,11 +63,11 @@ class KBinsDiscretizer(TransformerMixin, BaseEstimator):

Attributes
----------
n_bins_ : int array, shape (n_features,)
n_bins_ : ndarray of shape (n_features,), dtype=np.int_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering why np.int_ is used here instead of np.int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find np.int to be a valid dtype in https://numpy.org/doc/stable/user/basics.types.html (similar to the comment in #18025 (comment)). But as I understand it np.int_ is platform dependent and will be either np.int32 or np.int64. Therefore, I thought I could be a good dtype to use here. Please correct me if I'm wrong or misunderstood something. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say dtype={np.int32, np.int64} to avoid platform-dependent definitions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.int would be fine with me:

In [1]: import numpy as np

In [2]: np.int
Out[2]: int

In [3]: type(np.int)
Out[3]: type

This is a valid type and would be platform-specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm not sure we should indicate the dtypes everywhere, particularly where the input can be of a variety of dtypes which can be cast to float64 as long as check_array works.

@rth rth merged commit 0784207 into scikit-learn:master Aug 31, 2020
@franslarsson franslarsson deleted the doc-default-discretization branch September 3, 2020 20:33
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
…tion.py (scikit-learn#18290)

Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants