-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
Describe the issue linked to the documentation
In the documentation the input array for the StandardScaler is given by:
scale_ndarray of shape (n_features,) or None
I have the following example where I transform an int array and a float array:
from sklearn.preprocessing import StandardScaler
import numpy as np
x = np.array([
[1, 1, 1, 0, 1, 0],
[1, 1, 1, 0, 1, 0],
[0, 8, 0, 1, 0, 0],
[1, 4, 1, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 4, 0, 1, 0, 1]])
preprocess_fn=StandardScaler()
x[:,[1]] = preprocess_fn.fit_transform(x[:,[1]])
print(x)
[[1 0 1 0 1 0]
[1 0 1 0 1 0]
[0 1 0 1 0 0]
[1 0 1 1 0 0]
[0 0 0 0 1 0]
[0 0 0 1 0 1]]
In this case I wanted to transform col 1 of my array. The result looks like that the number are rounded or converted to ints.
x[:,[1]] = preprocess_fn.inverse_transform(x[:,[1]])
print(x)
When I call inverse_transform I am getting:
File "../test.py", line 16, in <module>
x[:,[1]] = preprocess_fn.inverse_transform(x[:,[1]])
File"../.local/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 934, in inverse_transform
X *= self.scale_
numpy.core._exceptions.UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
I would say this can be solved with X = X * self.scale_
but doing this by hand gives me the following array:
[[1 0 1 0 1 0]
[1 0 1 0 1 0]
[0 2 0 1 0 0]
[1 0 1 1 0 0]
[0 0 0 0 1 0]
[0 0 0 1 0 1]]
Which is different from the original one. Casting the array to float from the beginning I don't have this problems.
For me it was not clear from the documentation if only float arrays are allowed. But I am not sure if this is only a documentation problem, because the Traceback error should be resolved as well or there should be a check if the array is of int type.
Suggest a potential alternative/fix
scale_ndarray of shape (n_features,) or None and type float.
System:
- sklearn 0.24.0
- python 3.9.1
- numpy 1.19.5