Understanding AutoML in Databricks
Databricks AutoML uses a glass-box approach to AutoML. When you use Databricks AutoML either through the UI or through the supported Python API, it logs every combination of model and hyperparameter (trial) as an MLflow run and generates Python notebooks with source code corresponding to each model trial. The results of all these model trials are logged into the MLflow tracking server. Each of the trials can be compared and reproduced. Since you have access to the source code, the data scientists can easily rerun a trial after modifying the code. We will look at this in more detail when we go over the example.
Databricks AutoML also prepares the dataset for training and then performs model training and hyperparameter tuning on the Databricks cluster. One important thing to keep in mind here is that Databricks AutoML spreads hyperparameter tuning trials across the cluster. A trial is a unique configuration of hyperparameters associated with the...