-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
DOC add FAQ entry for the many linear model classes #19861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e13a516
3648dac
c97380a
0b2a92b
361605e
ca62e9a
c15bc09
8fe7980
5727a05
6a9dd77
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -396,3 +396,44 @@ and not at test time, for resampling and similar uses, | |||||||||
like at `imbalanced-learn`. | ||||||||||
In general, these use cases can be solved | ||||||||||
with a custom meta estimator rather than a Pipeline | ||||||||||
|
||||||||||
Why are there so many different estimators for linear models? | ||||||||||
------------------------------------------------------------- | ||||||||||
Usually, there is one classifier and one regressor per model type, e.g. | ||||||||||
:class:`~ensemble.GradientBoostingClassifier` and | ||||||||||
:class:`~ensemble.GradientBoostingRegressor`. Both have similar options and | ||||||||||
both have the parameter `loss`, which is especially useful in the regression | ||||||||||
case as it enables the estimation of conditional mean as well as conditional | ||||||||||
quantiles. | ||||||||||
|
||||||||||
For linear models, there are many estimator classes which are very close to | ||||||||||
each other. Let us have a look at | ||||||||||
|
||||||||||
- :class:`~linear_model.LinearRegression`, no penalty | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can be read as "prefer Ridge with a small penalty for teaching purposes", so maybe:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, I'm not sure that this is the right place. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @lorentzenchr. I think the message to convey is boiling down to a listing of the linear model in regression. We can make it explicit beforehand that we are stating which models to use but rather making a catalogue of the model There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now thinking that this is only an entry in the FAQ, I think this is even more relevant to not add usage information. |
||||||||||
- :class:`~linear_model.Ridge`, L2 penalty | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think that it would make sense to list There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about that while I was writing this PR and decided against it. I'd say that they build their own group, meaning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not a big deal on my side, we can keep it as it is. |
||||||||||
- :class:`~linear_model.Lasso`, L1 penalty (sparse models) | ||||||||||
- :class:`~linear_model.ElasticNet`, L1 + L2 penalty (less sparse models) | ||||||||||
- :class:`~linear_model.SGDRegressor` with `loss='squared_loss'` | ||||||||||
|
||||||||||
**Maintainer perspective:** | ||||||||||
They all do in principle the same and are different only by the penalty they | ||||||||||
impose. This, however, has a large impact on the way the underlying | ||||||||||
optimization problem is solved. In the end, this amounts to usage of different | ||||||||||
methods and tricks from linear algebra. A special case is `SGDRegressor` which | ||||||||||
comprises all 4 previous models and is different by the optimization procedure. | ||||||||||
A further side effect is that the different estimators favor different data | ||||||||||
layouts (`X` c-contiguous or f-contiguous, sparse csr or csc). This complexity | ||||||||||
of the seemingly simple linear models is the reason for having different | ||||||||||
estimator classes for different penalties. | ||||||||||
|
||||||||||
**User perspective:** | ||||||||||
First, the current design is inspired by the scientific literature where linear | ||||||||||
regression models with different regularization/penalty were given different | ||||||||||
names, e.g. *ridge regression*. Having different model classes with according | ||||||||||
names makes it easier for users to find those regression models. | ||||||||||
Secondly, if all the 5 above mentioned linear models were unified into a single | ||||||||||
class, there would be parameters with a lot of options like the ``solver`` | ||||||||||
parameter. On top of that, there would be a lot of exclusive interactions | ||||||||||
between different parameters. For example, the possible options of the | ||||||||||
parameters ``solver``, ``precompute`` and ``selection`` would depend on the | ||||||||||
chosen values of the penalty parameters ``alpha`` and ``l1_ratio``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say here that it is there only for teaching purpose. IMHO, it should never be used, and I would love to retire this class (though I do know that not everybody agrees with this desire).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it is the wrong place to mention this. And I belong to the mentioned small group...