Optimizing model selection with scikit-learn, Hyperopt, and MLflow
As we saw in the previous sections, Hyperopt is a Python library that allows us to track optimization runs that can be used for hyperparameter model tuning distributed computing environments such as Azure Databricks. In this section, we will go through an example of training a scikit-learn model. We will use Hyperopt to track the tuning process and log the results to MLflow, the model life cycle management platform.
In Azure Databricks Runtime for Machine Learning, we have an optimized version of Hyperopt at our disposal that supports MLflow tracking. Here, we can use the SparkTrials
objects to log the results of the tuning process of single-machine models during parallel executions. We will use these tools to find the best set of hyperparameters for several scikit-learn models.
We will do the following:
- Prepare the training dataset.
- Use Hyperopt to define the objective function to be minimized. ...