-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Closed
Milestone
Description
Describe the bug
#19527 introduced a regression with StandardScaler when dealing with data with small magnitudes.
Steps/Code to Reproduce
In MNE-Python some of our data channels have magnitudes in the ~1e-13 range. On 638b768 or before, this code (which uses random data of different scales) returns all True, which seems correct:
import numpy as np
from sklearn.preprocessing import StandardScaler
for scale in (1e15, 1e10, 1e5, 1, 1e-5, 1e-10, 1e-15):
data = np.random.RandomState(0).rand(1000, 4) - 0.5
data *= scale
scaler = StandardScaler(with_mean=True, with_std=True)
X = scaler.fit_transform(data)
stds = np.std(data, axis=0)
means = np.mean(data, axis=0)
print(np.allclose(X, (data - means) / stds, rtol=1e-7, atol=1e-7 * scale))
But on c748e46 / after #19527, anything "too small" starts to fail, as I get 5 True and the last two scale factors (1e-10, 1e-15) False. Hence StandardScaler
no longer standardizes the data.
cc @ogrisel since this came from your PR and @maikia @rth @agramfort since you approved the PR