Creating a Databricks workspace
A Databricks workspace is an environment that contains Databricks assets such as the following:
- Notebooks: A notebook is an interface that contains a series of runnable commands. It's a placeholder for code, visualizations, and narrative text.
- Libraries: Can be third-party or locally based. They contain code that can be used in notebooks.
- Experiments: Used primarily by machine learning, they allow the visualization of an MLflow run.
- Clusters: Virtual machines in Azure that act as a compute service. They execute the code we write in notebooks.
- Jobs: A job is used to run Databricks commands without using the notebook UI. A job is called via a scheduler or data factory.
Now that we have a better idea of the Databricks components, let's dig into it.
Getting ready
If you are using a trial Azure subscription, you will need to upgrade it to a Pay-As-You-Go subscription. Azure Databricks requires eight cores of...