Examining the Synapse Spark architecture
With Synapse Spark pools, Microsoft adds another scalable parallel processing engine to the Synapse ecosystem. The Microsoft implementation of Spark adds in-memory processing capabilities that support languages such as Python, Scala, Java, and even .NET for Spark and SQL.
The engine comes with built-in compatibility with Azure Data Lake Gen2 and Azure Storage. This enables the Spark Core engine, via the YARN layer (which is a JobTracker, resource management, and job scheduling/monitoring tool), to access the data that you have brought to Azure. This way, Spark Core exposes the storage components to libraries such as Spark SQL for interactive querying, MLib for machine learning, and GraphX for graph computation at scale.
Spark implements in-memory computation algorithms that can run your Spark jobs or notebooks in parallel on defined clusters. As mentioned previously, clusters will hold the data to be computed in memory in a distributed...