Exploring clusters
Clusters are the primary computing units that will do the heavy lifting when you’re training your ML models. The VMs associated with a cluster are provisioned in Databricks users’ cloud subscriptions; however, the Databricks UI provides an interface to control the cluster type and its settings.
Clusters are ephemeral compute resources. No data is stored on clusters:
Figure 2.6 – The Clusters tab
The Pools feature allows end users to create Databricks VM pools. One of the benefits of working in the cloud environment is that you can request new compute resources on demand. The end user pays by the second and returns the compute once the load on the cluster is low. This is great; however, requesting a VM from the cloud provider, ramping it up, and adding it to a cluster still takes some time. Using pools, you can pre-provision VMs to keep them in a standby state. If a cluster requests new nodes and has access...