Generating data profiles with AutoML
We introduced Databricks AutoML in Chapter 1. This tool automates ML development and augments data science workflows. AutoML is best known for generating models, but we’ll get to modeling in Chapter 6. Since we’re talking about getting to know your data, we first want to focus on one extremely useful feature built into AutoML that often flies under the radar: autogenerated Python notebooks. AutoML provides a notebook for data exploration in addition to the notebook code for every experiment it runs. We will jump right into creating an AutoML experiment, view the data exploration code, and then return to explore the modeling portion later.
We’ll cover how to create an AutoML experiment via an API in the Favorita project notebooks. We encourage you to follow the instructions here to set up a simple regression experiment with the AutoML UI, so that we can take a look at the data profile created. Before you begin, make sure you...