Chapter 7: Using Databricks Spark Clusters
In the last chapter, Chapter 6, Using Synapse Spark Pools, you learned about Spark and the Synapse integrated Spark engine. But what about cases where you only need a Spark cluster to interact with your Data Lake Store? You would, for example, choose Databricks over Synapse Spark pools at this point in time, when you need to work on Spark 3.0 or when you need to implement Structured Streaming. R, as a required programming language, will require Databricks as well as the Databricks-specific features of Delta Lake, such as vacuuming and others. Synapse will offer most of these options, too, in the future. But at the moment, they are available only in Databricks.
With Azure Databricks, Microsoft offers a standalone Spark environment that will give you all the aforementioned options and can still integrate with other data services on Azure if needed. And with Databricks, you have the people at your back that invented Spark. The cluster architecture...