Running AutoML on our churn prediction dataset
Let’s take a look at how to use Databricks AutoML with our bank customer churn prediction dataset.
If you executed the notebooks from Chapter 3, Utilizing the Feature Store, you will have raw data available as a Delta table in your Hive metastore. It has the name raw_data
. In the Chapter 3 code, we read a CSV file from our Git repository with raw data, wrote that as a Delta table, and registered it in our integrated metastore. Take a look at cmd 15
in your notebook. In your environment, the dataset can be coming from another data pipeline or uploaded directly to the Databricks workspace using the Upload file functionality.
To view the tables, you need to have your cluster up and running.
Figure 5.1 – The location of the raw dataset
Let’s create our first Databricks AutoML experiment.
Important note
Make sure that before following the next steps, you have a cluster up and running...