Managing the quality and scalability of data ingestion pipelines – the three key topics
Apart from going through the steps of setting up a data ingestion pipeline, there are also three important topics that are relevant to each step of the entire pipeline.
Scalability and resilience
As the load on your pipeline increases over time, there is more and more pressure on your pipeline to keep up in terms of performance. Even though you might start with a sequential, single-thread program as is common, for example, when writing in Python, over time, you might want to consider turning parts of your pipeline into loosely coupled functions that can scale independently. For example, the extraction might happen on a single machine that you might have to increase in size over time, while the transformations and loading scales dynamically with serverless functions depending on the load.
In any case, you will have to implement some sort of error or exception handling to be able to...