Technical requirements
To follow the hands-on tutorials in this chapter, you will need access to the following:
- A Microsoft Azure subscription
- Azure Databricks
- Azure Databricks notebooks and a Spark cluster
- Access to this book's GitHub repository: https://github.com/PacktPublishing/Optimizing-Databricks-Workload/tree/main/Chapter05
To start off, let's spin up a Spark cluster with the following configurations:
- Cluster Mode:
Standard
- Databricks Runtime Version:
8.3 (includes Apache Spark 3.1.1, Scala 2.12)
- Autoscaling:
Disabled
- Automatic Termination: After
30
minutes of inactivity - Worker Type:
Standard_DS3_v2
- Number of workers:
1
- Spot instances:
Disabled
- Driver Type:
Standard_DS3_v2
Now, create a new notebook and attach it to the newly created cluster to get started!