Chapter 9: Building Your Data Pipeline
In the previous chapter, you understood the example business goal of improving user experience by recommending flights that have a higher on-time probability. You have worked with the business subject matter expert (SME) to understand the available data. In this chapter, you will see how the platform assists you in harvesting and processing data from a variety of sources. You will see how on-demand Spark clusters can be created and how workloads could be isolated in a shared environment using the platform. New flights data may be available on a frequent basis and you will see how the platform enables you to automate the execution of your data pipeline.
In this chapter, you will learn about the following topics:
- Automated provisioning of a Spark cluster for development
- Writing a Spark data pipeline
- Using the Spark UI to monitor your jobs
- Building and executing a data pipeline using Airflow