Technical requirements
To follow the hands-on tutorials in this chapter, the following are required:
- A Microsoft Azure subscription
- Azure Databricks
- Azure Databricks notebooks and a Spark cluster
- Access to this book's GitHub repository:
https://github.com/PacktPublishing/Optimizing-Databricks-Workload/tree/main/Chapter07
To start off, let's spin up a Spark cluster with the following configurations:
- Cluster Name:
packt-cluster
- Cluster Mode: Standard
- Databricks Runtime Version:
8.3
(includes Apache Spark 3.1.1, Scala 2.12) - Autoscaling: Disabled
- Automatic Termination: After
30
minutes of inactivity - Worker Type:
Standard_DS3_v2
- Number of workers:
2
- Spot instances: Disabled
- Driver Type:
Standard_DS3_v2
Now, create a new notebook and attach it to the newly created cluster to get started!