Developing a Batch Processing Solution
In Chapter 4, Ingesting and Transforming Data, you learned about services such as Spark, Azure Data Factory (ADF), and Synapse SQL. You will continue doing so here and learn about a few more batch-processing-related technologies.
In this chapter, you will develop robust batch-processing solutions using Azure’s analytics services such as Data Lake Storage, Databricks, Synapse Analytics, and Data Factory. Focus areas include optimizing SQL pool data loading with PolyBase and implementing Azure Synapse Link for seamless data querying. You’ll create scalable data pipelines, optimize batch sizes, and ensure pipeline integrity through rigorous testing. Integration of notebooks will enhance analytical capabilities.
You will also master data manipulation techniques such as upserting, reverting data states, configuring advanced exception handling, and managing batch retention policies for effective data life cycle management. Interacting...