Chapter 6: Using Synapse Spark Pools
In your modern data warehouse project, you may use Azure Data Factory ETL pipelines (see Chapter 5, Integrating Data into Your Modern Data Warehouse) to integrate and transform incoming data according to your needs. However, chances are that you are a more code-oriented developer, that you are already very proficient with Spark, or that your transformational needs reach beyond the functionality or the available compute power of Data Factory.
Maybe you need to train and implement machine learning models as part of your project, and you want a Spark engine that can scale to your needs and offers suitable libraries and tight integration with all the other tools that you plan to use on Azure.
This chapter will discuss Synapse Spark pools and how to implement them on Azure. You will learn about their architecture and how jobs are handled when they are dispatched to a cluster. You will examine how to implement notebooks and Spark jobs and integrate...