Developing batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse Pipelines, PolyBase, and Azure Databricks
Let's try to build an end-to-end batch pipeline using all the technologies listed in the topic header. We will use our Imaginary Airport Cab (IAC) example from the previous chapters to create a sample requirement for our batch processing pipeline. Let's assume that we are continuously getting trip data from different regions (zip codes), which is stored in Azure Blob storage, and the trip fares are stored in an Azure SQL Server. We have a requirement to merge these two datasets and generate daily revenue reports for each region.
In order to take care of this requirement, we can build a pipeline as shown in the following diagram:
The preceding pipeline, when translated into an ADF pipeline, would look like the following figure: