Technical requirements
Before we begin this chapter, you must have a few prerequisites ready.
In this chapter’s exercises, we will use Dataflow, Pub/Sub, Google Cloud Storage (GCS), and BigQuery. If you’ve never opened any of these services in your GCP console, you can open them now and enable their application programming interfaces (APIs).
Also, make sure you have your GCP console, Cloud Shell, and Cloud Shell Editor ready.
You can download the example code and the dataset from here: https://github.com/PacktPublishing/Data-Engineering-with-Google-Cloud-Platform-Second-Edition/tree/main/chapter-6.
Be aware of the cost that might arise from Dataflow streaming. Make sure you delete all the environments after executing the exercises in this chapter to prevent unexpected charges.
This chapter will use the same data from Chapter 5, Building a Data Lake Using Dataproc. You can choose to use the same data or prepare new data from the preceding GitHub repository...