Technical requirements
To follow the hands-on tutorials in the chapter, the following is required:
- An Azure subscription
- Azure Databricks
- Azure Databricks notebooks and a Spark cluster
- Access to this book's GitHub repository at https://github.com/PacktPublishing/Optimizing-Databricks-Workload/tree/main/Chapter06
To start, let's spin up a Spark cluster with the following configurations:
- Cluster Name:
packt-cluster
- Cluster Mode:
Standard
- Databricks Runtime Version:
8.3 (includes Apache Spark 3.1.1, Scala 2.12)
- Autoscaling:
Disabled
- Automatic Termination: After
30
minutes of inactivity - Worker Type:
Standard_DS3_v2
- Number of workers:
1
- Spot instances: Disabled
- Driver Type: Same as the worker
Now, create a new notebook and attach it to the newly created cluster to get started!