-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
When specifying the columns to which to apply a transformer within a ColumnTransformer, using []
results in a ColumnTransformer with different internal state than using [False, False, ...]
- in the former case, the internal transformer has been fitted on an array of shape (n,0) while in the latter case it has not been fitted, which can cause downstream problems.
Steps/Code to Reproduce
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
c1 = ColumnTransformer([('ohe', OneHotEncoder(drop='first'), [])]).fit([[1, 2],[3, 4]])
c2 = ColumnTransformer([('ohe', OneHotEncoder(drop='first'), [False, False])]).fit([[1, 2],[3, 4]])
c1.get_feature_names() # succeeds
c2.get_feature_names() # fails because OneHotEncoder was not fit
Expected Results
In either case, the internal transformer has been fit with an empty array.
Actual Results
When using an array of all False values, the internal transformer has not been fit.
Versions
System:
python: 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)]
executable: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\python.exe
machine: Windows-10-10.0.19041-SP0
Python dependencies:
pip: 21.0.1
setuptools: 51.0.0
sklearn: 0.24.1
numpy: 1.18.5
scipy: 1.5.4
Cython: 0.29.14
pandas: 1.0.5
matplotlib: None
joblib: 1.0.0
threadpoolctl: 2.1.0
Built with OpenMP: True