Searching for a Signal
In this chapter, we’ll cover how to use data science to search for a signal hidden in the noise of data.
We will leverage the features we created within the Databricks platform during the previous chapter. We start by using automated machine learning (AutoML) for a basic modeling approach, which provides autogenerated code and quickly enables data scientists to establish a baseline model to beat. When searching for a signal, we experiment with different features, hyperparameters, and models. Historically, tracking these configurations and their corresponding evaluation metrics is a time-consuming project in and of itself. A low-overhead tracking mechanism, such as the tracking provided by MLflow, an open source platform for managing data science projects and supporting ML operations (MLOps) will reduce the burden of manually capturing configurations. More specifically, we’ll introduce MLflow Tracking, an MLflow component that significantly improves...