Understanding the Databricks components
In the last chapter, Chapter 6, Using Synapse Spark Pools, we examined the basic Spark architecture, and Databricks also follows those rules. You will find driver and worker nodes that will process your requests. And we shouldn't forget that Databricks was the first to deliver autoscaling Spark as a Service, which will even take the compute environment down as soon as an idle time threshold is reached.
Although Databricks is based on Apache Spark, it has built its own runtime, optimized for usage on Azure. When you spin up a cluster, for example, different sessions will reuse the same cluster and will not instantiate it as with Synapse Spark pools.
Creating Databricks clusters
This section will take you through the provisioning process of a Databricks cluster. You will see the different node sizes and the options that you have, such as autotermination and autoscaling, when you create your compute engine here.
But let's see...