Precautions to consider
Back in Chapter 2, we referenced that there is a wide range of purposes for data pipelines, ranging from daily updates for business analytics dashboards to cyclical long-term storage. Since many organizations make decisions based on the resulting output data, not only is the accuracy of data transformations crucial, but the resulting format and quality of the data loaded need to remain cohesive with the data that already exists within the target location.
In a clean, reproducible, and scalable data ecosystem, the target data output location maintains its own, arguably authoritative, structure that serves as the ground truth for business data within your company. It requires you to scrutinously manage the ongoing ETL processes that keep the storage environment up to date. When discussing the differences between full and incremental data loads in the previous section, it became clear that there is a need to distinguish between new, freshly curated data and...