Summary
Phew! This is another marathon chapter in which you have built the data processing pipeline for predicting flights' on-time performance. You have seen how the platform you have built enables you to write complicated data pipelines using Apache Spark, without worrying about provisioning and maintaining the Spark cluster. In fact, you have completed all the exercises without specific help from the IT group. You have automated the execution of the data pipeline using the technologies provided in the platform and have seen the integration of the Airflow pipelines from your IDE, the same IDE you have used for writing the Spark data pipeline.
Keeping in mind that the main purpose of this book is to help you provide a platform where data and ML teams can work in a self-serving and independent manner, you have just achieved that. You and your team own the full life cycle of data engineering and scheduling the execution of your pipelines.
In the next chapter, you will see...