Core batch processing patterns
In this section, we will look at a few commonly used data engineering patterns to solve batch processing problems. Although there can be many variations of the implementation, these patterns are generic, irrespective of the technologies used to implement the patterns. In the following sections, we’ll discuss the commonly used batch processing patterns.
The staged Collect-Process-Store pattern
The staged Collect-Process-Store pattern is the most common batch processing pattern. It is also commonly known as the Extract-Transform-Load (ETL) pattern in data engineering. This architectural pattern is used to ingest data and store it as information. The following diagram depicts this architectural pattern:
Figure 7.1 – The staged Collect-Process-Store pattern
We can break this pattern into a series of stages, as follows:
- In this architectural pattern, one or more data sources are extracted and kept in a form...