[RFC] Should scalers or other estimators warn when fit on constant features?

As discussed in #19527, fitting models on data with constant feature can be surprising.

For instance a `StandardScaler(with_mean=False)` fit on a column with constant values set to `1000.` will let those values passthough unchanged because the variance of the column is null. It can be surprising but is this a problem? Should we warn the user about the presence of such constant features which are typically not predictive for machine learning models?

Which estimator should warn about such constant features? The scalers can naturally detect those because they can detect them when computing the `scale_` attribute. The `QuantileTransformer` could also probably warn about this degenerate case.

`HistGradientBoosting*` and `KBinsDiscretizer` can also do it efficiently when binning the feature values.

If we do so:

- what should be the warning message? Should it be the same for all the models?
- shall we add a standard constructor param to this estimators `constant_feature={'warn', 'drop', 'passthrough', 'zero', 'one'}` with `"warn"` as the default?
- should we generalize this to all estimators? (ogrisel: probably not because it could be expensive and redundant input validation check so we could restrict to the estimators above where it's cheap to check)

Are there legitimate cases where such a warning would be frequent and annoying? For instance `StandardScaler(with_mean=False)` after `OneHotEncoding` with dense output with a categorical feature that has a category that is significantly more frequent than the others in cross-validation loop? A similar problem could happen with after `OrdinalEncoding`. But would `StandardScaler(with_mean=False)` would actually make sense to use in those cases?


List of estimators to consider:

- scalers (such as `StandardScaler`, `RobustScaler`, `MinMaxScaler`)...
- estimators that do feature binning: `HistGradientBoosting*` and `KBinsDiscretizer`,
- feature selectors such as `SelectKBest`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC] Should scalers or other estimators warn when fit on constant features? #19547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC] Should scalers or other estimators warn when fit on constant features? #19547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions