Skip to content

[MRG] Fixes test_scale_and_stability by clipping small values #13903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 23, 2019

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented May 18, 2019

Reference Issues/PRs

Fixes #6279

What does this implement/fix? Explain your changes.

Clips small values in _PLS.fit.

This has been causing errors in our nightly tests (scipy-dev). This will start to cause errors in our regular tests since scipy 1.13.0 came out.

@qinhanmin2014
Copy link
Member

This test is still failing.
What has changed in numpy/scipy?

@thomasjpfan
Copy link
Member Author

It's this PR that changed the default behavior of cond in pinv2: scipy/scipy#10067

@thomasjpfan
Copy link
Member Author

I changed the tol for the failing estimator, CCA. This fixed the issue locally, lets see what our CI thinks.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with that algorithm but this seems reasonable as long as CIs are green.

@qinhanmin2014 qinhanmin2014 added this to the 0.21.2 milestone May 18, 2019
@thomasjpfan thomasjpfan changed the title [MRG] "Fixes" test_scale_and_stability by reducing decimals [MRG] "Fixes" test_scale_and_stability by changing tol May 18, 2019
@thomasjpfan thomasjpfan changed the title [MRG] "Fixes" test_scale_and_stability by changing tol [MRG] Fixes test_scale_and_stability by changing tol May 18, 2019
@qinhanmin2014
Copy link
Member

Hmm, I'm surprised to see something like

Max absolute difference: 0.06155424
Max relative difference: 1.

Is there a bug? Will be grateful if you can briefly explain what's happening here.

@qinhanmin2014
Copy link
Member

I quickly went through the code, it's strange that scaling the same data for multiple times will introduce such a big difference?

@thomasjpfan
Copy link
Member Author

This is likely a 32-bit numerical issue, since the tests pass for the other environments.

@thomasjpfan
Copy link
Member Author

When I set the tolerance too low, it went below np.finfo('f').eps=1e-07, which caused the 32 bit float comparisons in _nipals_twoblocks_inner_loop to fail.

@qinhanmin2014
Copy link
Member

Heading to bed and thanks for your hard work here :)
Honestly I'm OK to skip the test on 32-bit machines and keep the issue open. This does not seem to be a serious issue.

@thomasjpfan thomasjpfan changed the title [MRG] Fixes test_scale_and_stability by changing tol [WIP] Fixes test_scale_and_stability by changing tol May 19, 2019
@NicolasHug
Copy link
Member

Quick recap of our pair debugging sesh with @thomasjpfan:

  • it's not just windows or 32 bits: we can reproduce on linux and macos, using scipy 1.3
  • it's definitely caused by MAINT: Fix the cutoff value inconsistency for pinv2 and pinvh scipy/scipy#10067 which changes the defualt cond param of pinv2: small eigen values (smaller than cond) would be ignored before. On 1.3, they're not ignored, causing some extremely large values in the inverse matrix Y_pinv (since pinv2 would divide by these small eigen values).

There are 2 solutions:

  • Not use the default cond in our calls to pinv2
  • Change the tolerance in the test. We went for this solution

@thomasjpfan thomasjpfan changed the title [WIP] Fixes test_scale_and_stability by changing tol [MRG] Fixes test_scale_and_stability by changing tol May 20, 2019
@qinhanmin2014
Copy link
Member

@NicolasHug @thomasjpfan tests are still failing

@thomasjpfan
Copy link
Member Author

This error stems from how _nipals_twoblocks_inner_loop calculates y_weights, when dealing with small numbers with different signs:

X = np.array([[1.0, 2.0], 
              [4.0, 3.0]])

# negative y[1,0]
y_n = np.array([[1.0e-15, 1.0],
              [-1.0e-14, 2.0]])

# positive y[1,0]
y_p = np.array([[1.0e-15, 1.0],
               [1.0e-14, 2.0]])

_nipals_twoblocks_inner_loop(X, y_n, mode="B")
# array([[-4.06091728e+12], [1.24670160e+00]])

_nipals_twoblocks_inner_loop(X, y_p, mode="B")
# array([[4.22666900e+12], [1.23841402e+00]])

The PR fixes this issue by replacing the small values with zero.

@jnothman jnothman mentioned this pull request May 21, 2019
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So changing cond in the call to pinv2 doesn't work?

I'm fine with this anyway, I don't think there's much more we can do.

@qinhanmin2014 you might want to have another look since this has changed since you approved it.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a what's new (0.21.2) entry.

Do the original tests now pass with the small value clipping?

Does the eigenvalue conditioning issue relate to #12145?

@thomasjpfan
Copy link
Member Author

thomasjpfan commented May 22, 2019

So changing cond in the call to pinv2 doesn't work?

I think we would need two conditions, one for single precision and another for double precision. (The double precision would need to be scaled up more). This was how it was done in pinv2 before scipy 1.13.0.

Does the eigenvalue conditioning issue relate to #12145?

This is related. The cond parameter in pinv2 determines if an eigenvalue is "significant". scipy 1.13.0 fixed a bug that made it more inclusive, ie smaller eigenvalues are now considered "significant".

Do the original tests now pass with the small value clipping?

Yes. The value clipping forces the eigenvalue to zero, thus forcing pinv2 to consider it "insignificant".

@thomasjpfan thomasjpfan reopened this May 22, 2019
@jnothman
Copy link
Member

If the original tests pass, why have we modified them?

@thomasjpfan thomasjpfan changed the title [MRG] Fixes test_scale_and_stability by changing tol [MRG] Fixes test_scale_and_stability by clipping small values May 22, 2019
@jnothman jnothman merged commit c418761 into scikit-learn:master May 23, 2019
@jnothman
Copy link
Member

Thanks for the fix, @thomasjpfan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_pls.test_scale_and_stability failure
5 participants