Technical requirements
As in the other chapters, to create the environment to run the code examples in this chapter you can run:
conda env create –f mlewp-chapter09.yml
This will include installs of Airflow, PySpark, and some supporting packages. For the Airflow examples, we can just work locally, and assume that if you want to deploy to the cloud, you can follow the details given in Chapter 5, Deployment Patterns and Tools. If you have run the above conda
command then you will have installed Airflow locally, along with PySpark and the Airflow PySpark connector package, so you can run Airflow as standalone with the following command in the terminal:
airflow standalone
This will then instantiate a local database and all relevant Airflow components. There will be a lot of output to the terminal, but near the end of the first phase of output, you should be able to spot details about the local server that is running, including a generated user ID and password...