Technical requirements
Before we begin the chapter, make sure you have the following prerequisites ready.
In this chapter's exercises, we will use these GCP services: Dataflow, Pub/Sub, Google Cloud Storage (GCS), and BigQuery. If you have never opened any of these services in your GCP console, open them and enable the application programming interface (API).
Make sure you have your GCP console, Cloud Shell, and Cloud Shell Editor ready.
Download the example code and the dataset from here:
https://github.com/PacktPublishing/Data-Engineering-with-Google-Cloud-Platform/tree/main/chapter-6
Be aware of the cost that might occur from dataflow streaming. Make sure you delete all the environment after the exercises to prevent unexpected costs.
This chapter will use the same data from Chapter 5, Building a Data Lake Using Dataproc. You can choose to use the same data or prepare new data from the GitHub repository.