-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Open
Labels
Description
Describe the workflow you want to enable
I would like to save off the results of all runs in GridSearchCV to MLFlow. MLFlow
for param in params:
with mlflow.start_run():
est = ElasticNet(**param)
est.fit(train_x, train_y)
metrics = est.score(test_x, test_y)
mlflow.log_params(param)
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(est, "model")
See https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html for more details:
I would like to use GridSearchCV
to do the above because it comes with many other features (e.g. HalvingGridSearchCV
, multi-threading, etc ...)
Describe your proposed solution
A callback parameter to GridSearchCV
. Perhaps
def log_candidate(model, test_x, test_y):
with mlflow.start_run():
mlflow.log_params(model.get_params())
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(est, "model")
Describe alternatives you've considered, if relevant
To hack the scorer for this purpose: https://danielhnyk.cz/adding-callback-to-a-sklearn-gridsearch/
This is suboptimal because:
- If you want to return multiple metrics, you cannot save multiple scores using the provided API. This is because we have to pass multiple scorers, not a function that generates multiple scores.
- Enabling
return_train_score
will call the scorer callback too many times and it is not easy to distinguish between the training and testing scoring.
Additional context
No response