Skip to content

DEPR Change default value of random_state in SVD #19370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

cliffordEmmanuel
Copy link
Contributor

Reference Issues/PRs

References #18584

What does this implement/fix? Explain your changes.

Fixes the default value of random state in randomized_svd and provides a deprecation warning should the value not be provided.

Any other comments?

@cinbez provided assistance

@reshamas
Copy link
Member

reshamas commented Feb 6, 2021

#DataUmbrella sprint

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error in the CI is related to your change. You can see the logs and see where it fails and fix that in the same PR.

@glemaitre glemaitre changed the title Random svd parameter change DEPR Change default value of random_state in SVD Feb 6, 2021
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@ogrisel
Copy link
Member

ogrisel commented Feb 6, 2021

I think we should review tests that are calling randomized_svd without passing an explicit value for random_state to update them (and pass random_state=0 or random_state=42 explicitly).

I believe there is a few of such cases in test_extmath.py:

git grep "randomized_svd(" | grep test_

Comment on lines +305 to +306
During the deprecation cycle this will set to zero and afterwards
it will be set to None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
During the deprecation cycle this will set to zero and afterwards
it will be set to None.
The previous behavior (`random_state=0`) is deprecated, and
from v1.2 the default value will be `random_state=None`. Set
the value of `random_state` explicitly to suppress the deprecation
warning.

WDYT @glemaitre @ogrisel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@adrinjalali
Copy link
Member

You have linter issues (some lines are too long now)

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be even more explicit in the deprecation message:

"it will be set to 0 starting from version 1.2 "
"Set 'random_state' to an integer value to silence "
"this warning, "
"or to 0 to keep the behavior of versions <1.0 ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rather say:

"If random_state is not supplied, the current default is to use 0 as a fixed seed. This will change to None in version 1.2 leading to non-deterministic results that better reflect the nature of the randomized_svd solver. If you want to silence this warning, set random_state to an integer seed or to None explicitly depending if you want your code to be deterministic or not."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks very much

@lorentzenchr
Copy link
Member

The "ci/circleci: lint" tells you that there are a few line with more than 79 characters in the file sklearn/utils/tests/test_extmath.py: lines 253, 261, 311, 315, 405.

As stated in the developer guide, you might want to run

git diff upstream/main -u -- "*.py" | flake8 --diff

in order to detect PEP8 violations before pushing.

If you do this more often, it can help to use a pre-commit hook (also described in the developer guide) with

pip install pre-commit
pre-commit install

inside your scikit-learn directory.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the remaining comments above are taken care of!

@cliffordEmmanuel
Copy link
Contributor Author

LGTM once the remaining comments above are taken care of!

Hi please pardon my newbie question but what does LGTM mean?

@cliffordEmmanuel
Copy link
Contributor Author

The "ci/circleci: lint" tells you that there are a few line with more than 79 characters in the file sklearn/utils/tests/test_extmath.py: lines 253, 261, 311, 315, 405.

As stated in the developer guide, you might want to run

git diff upstream/main -u -- "*.py" | flake8 --diff

in order to detect PEP8 violations before pushing.

If you do this more often, it can help to use a pre-commit hook (also described in the developer guide) with

pip install pre-commit
pre-commit install

inside your scikit-learn directory.

I really appreciate this suggestion, could you kindly show me how to navigate to error lines, I was trying to use the details but it quickly became overwhelming and I couldn't find the error lines.
Thanks

@lorentzenchr
Copy link
Member

LGTM = looks good to me

@lorentzenchr
Copy link
Member

I really appreciate this suggestion, could you kindly show me how to navigate to error lines, I was trying to use the details but it quickly became overwhelming and I couldn't find the error lines.

How to navigate to error lines, but where? Do you mean in the CI (continuous integration)? I scroll until I find something. Often, searching for "error" or "failure" works.

Comment on lines +175 to +178
<<<<<<< HEAD
=======

>>>>>>> c8456e71c15c36ed47e094263d7d0e0c57bd8065
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<<<<<<< HEAD
=======
>>>>>>> c8456e71c15c36ed47e094263d7d0e0c57bd8065

I guess, those are the remnant from resolving a merge conflict.

Comment on lines +187 to +190
<<<<<<< HEAD
=======

>>>>>>> c8456e71c15c36ed47e094263d7d0e0c57bd8065
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<<<<<<< HEAD
=======
>>>>>>> c8456e71c15c36ed47e094263d7d0e0c57bd8065

Comment on lines +198 to +201
<<<<<<< HEAD
=======

>>>>>>> c8456e71c15c36ed47e094263d7d0e0c57bd8065
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<<<<<<< HEAD
=======
>>>>>>> c8456e71c15c36ed47e094263d7d0e0c57bd8065

@reshamas
Copy link
Member

Hello @cliffordEmmanuel @cinbez
How is this PR going? Do you have any questions on how to move this PR forward?

@cinbez
Copy link
Contributor

cinbez commented Feb 16, 2021

HI @reshamas - I have tried to get hold of @cliffordEmmanuel but with no luck. I have opened my own pull request solving this same issue here. Waiting for it to be approved.

@rth rth closed this in #19670 Mar 15, 2021
@cliffordEmmanuel cliffordEmmanuel deleted the random_svd_parameter_change branch March 22, 2021 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants