Creating a cluster in our workspace
A cluster is necessary to manipulate and transform data with Databricks. It is composed of a minimum of two machines:
- A driver node: Receives the commands and dispatches them to a worker.
- A worker node: Receives and executes the commands. We can use multiple workers that will execute the command in parallel.
There are also two types of clusters:
- Interactive: A cluster that is started manually. It is used to do interactive queries in a notebook, or another program connected to it, such as Power BI.
- Automated: A cluster created automatically to run a job and stopped after it. For example, this type of cluster is used when we use a Databricks activity in Azure Data Factory.
Let's create a cluster in our Databricks workspace now.
Getting ready
As with every recipe in this chapter, you will need to upgrade your trial Azure subscription to a Pay-As-You-Go subscription if this is not what you have been using...