Summary
In this chapter, you learned how to build and deploy a production data pipeline. You learned how to create TEST
and PRODUCTION
environments and built the data pipeline in TEST
. You used the filesystem as a sample data lake and learned how you would read files from the lake and monitor them as they were processed. Instead of loading data into the data warehouse, this chapter taught you how to use a staging database to hold the data so that it could be validated before being loaded into the data warehouse. Using Great Expectations, you were able to build a validation processor group that would scan the staging database to determine whether the data was ready to be loaded into the data warehouse. Lastly, you learned how to deploy the data pipeline into PRODUCTION
. With these skills, you can now fully build, test, and deploy production batch data pipelines.
In the next chapter, you will learn how to build Apache Kafka clusters. Using Kafka, you will begin to learn how to process...