Summary
In this chapter, we looked at why we need to ingest data in the first place and the different steps needed to create a data ingestion pipeline that is robust and reliable. We learned that there are eight essential steps to ingesting data, which can be covered both by off-the-shelf ETL tools as well as by custom scripts, depending on the specific needs of your data ingestion step. We also learned that to guarantee the long-term quality of your data ingestion pipeline, you need to consider the three key topics of scalability and resilience, monitoring, logging, and alerting, and finally, governance.
With this knowledge, you should be able to capture the data you need from a source system. In the next chapter, we will look at how to load and use this data in your data warehouse and how to pick one for your needs.