Understanding the SQL Warehouse architecture
The official documentation for SQL Warehouses (https://docs.databricks.com/sql/admin/sql-endpoints.html) defines a SQL Warehouse as a computation resource that lets you run SQL commands on data objects within Databricks SQL.
In practice, this computation resource manifests as a logical/virtual grouping of one or more physical clusters. The physical clusters are Apache Spark clusters, as provisioned by Databricks.
A single physical cluster follows the core architecture of Apache Spark, as shown in the following diagram:
Figure 6.1 – Physical cluster topology
As shown in the preceding diagram, two distinct processes make a cluster:
- Driver process: Think of this process as the brain of the cluster. It is responsible for accepting users’ queries, parsing them, planning them, and coordinating their distributed execution across the worker processes available in the cluster. The driver process...