Skip to content

ColumnTransformer treats empty arrays differently depending on their representation #19550

@kbattocchi

Description

@kbattocchi

When specifying the columns to which to apply a transformer within a ColumnTransformer, using [] results in a ColumnTransformer with different internal state than using [False, False, ...] - in the former case, the internal transformer has been fitted on an array of shape (n,0) while in the latter case it has not been fitted, which can cause downstream problems.

Steps/Code to Reproduce

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
c1 = ColumnTransformer([('ohe', OneHotEncoder(drop='first'), [])]).fit([[1, 2],[3, 4]])
c2 = ColumnTransformer([('ohe', OneHotEncoder(drop='first'), [False, False])]).fit([[1, 2],[3, 4]])

c1.get_feature_names() # succeeds
c2.get_feature_names() # fails because OneHotEncoder was not fit

Expected Results

In either case, the internal transformer has been fit with an empty array.

Actual Results

When using an array of all False values, the internal transformer has not been fit.

Versions

System:
python: 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)]
executable: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\python.exe
machine: Windows-10-10.0.19041-SP0

Python dependencies:
pip: 21.0.1
setuptools: 51.0.0
sklearn: 0.24.1
numpy: 1.18.5
scipy: 1.5.4
Cython: 0.29.14
pandas: 1.0.5
matplotlib: None
joblib: 1.0.0
threadpoolctl: 2.1.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions