Technical requirements
Before we begin to learn in this chapter, make sure you have the following prerequisites ready.
In this chapter’s exercises, the GCP services that we will use are Dataproc, Google Cloud Storage (GCS), BigQuery, and Cloud Composer. If you have never opened any of these services in your GCP console, open them and enable the APIs.
Make sure you have your GCP console, Cloud Shell, and Cloud Shell Editor ready.
Download the example code and the dataset from https://github.com/PacktPublishing/Data-Engineering-with-Google-Cloud-Platform-Second-Edition/tree/main/chapter-5.
Be aware of costs you might incur from Dataproc and the Cloud Composer cluster. Make sure you delete all the environments after the exercises to prevent any unexpected costs.