Batch ingestion architectures
The simplest form of ingestion architecture is a use case where data is only ingested in batches from other cloud-based sources (no sources residing on-premises). In this case, we will use data pipelines to periodically fetch large amounts of data and write them to the bronze layer in the data lake. Note that we restrain from performing any kind of transformation in this initial pipeline.
We will look at ingesting data from the following sources:
- Cloud sources
- On-premises sources
Let’s first look at how to ingest data from cloud-based sources.
Ingesting data from cloud sources
When ingesting data from other cloud sources, the connection is often more convenient, Also, we can make use of Azure-hosted integration runtimes (IRs). This will serve as the compute for the pipeline orchestration in either Azure Data Factory or Azure Synapse pipelines. Other Data Factory components will be more elaborately discussed in the next...