Building a machine learning app with Databricks and Azure Data Lake Storage
In addition to ETL/ELT jobs, data engineers often help data scientists to productionize machine learning applications. Using Databricks is an excellent way to simplify the work of the data scientist as well as create data preprocessing pipelines.
As we have seen in the previous recipe, ADF can trigger the execution of notebooks and JAR and Python files. So, parts of the app logic have to be encoded there.
A Databricks cluster uses its own filesystem (DBFS). So, we need to mount Azure Data Lake Storage to DBFS to access input data and the resulting files.
In this recipe, we will connect Azure Data Lake Storage to Databricks, ingest the MovieLens dataset, train a basic model for a recommender system, and store the model in Azure Data Lake Storage.
Getting ready
First, log in to your Microsoft Azure account.
We assume you have a pre-configured resource group and storage account with Azure...