Skip to content

GridSearchCV support callback for MLFlow  #26395

@tianhuil

Description

@tianhuil

Describe the workflow you want to enable

I would like to save off the results of all runs in GridSearchCV to MLFlow. MLFlow

for param in params:
    with mlflow.start_run():
        est = ElasticNet(**param)
        est.fit(train_x, train_y)
        metrics = est.score(test_x, test_y)
        mlflow.log_params(param)
        mlflow.log_metrics(metrics)
        mlflow.sklearn.log_model(est, "model")

See https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html for more details:

I would like to use GridSearchCV to do the above because it comes with many other features (e.g. HalvingGridSearchCV, multi-threading, etc ...)

Describe your proposed solution

A callback parameter to GridSearchCV. Perhaps

def log_candidate(model, test_x, test_y):
  with mlflow.start_run():
        mlflow.log_params(model.get_params())
        mlflow.log_metrics(metrics)
        mlflow.sklearn.log_model(est, "model")

Describe alternatives you've considered, if relevant

To hack the scorer for this purpose: https://danielhnyk.cz/adding-callback-to-a-sklearn-gridsearch/

This is suboptimal because:

  1. If you want to return multiple metrics, you cannot save multiple scores using the provided API. This is because we have to pass multiple scorers, not a function that generates multiple scores.
  2. Enabling return_train_score will call the scorer callback too many times and it is not easy to distinguish between the training and testing scoring.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions