Exercise – Publishing event streams to cloud Pub/Sub
In this exercise, we will try to stream data from Pub/Sub publishers. The goal is to create a data pipeline that can stream the data to a BigQuery table, but instead of using a scheduler (as we did in Chapter 4, Building Orchestration for Batch Data Loading Using Cloud Composer), we will submit a Dataflow job that will run as an application to flow data from Pub/Sub to a BigQuery table. In the exercise, we will use the bike-sharing dataset we used in Chapter 3, Building a Data Warehouse in BigQuery. Here are the overall steps in this Pub/Sub section:
- Creating a Pub/Sub topic
- Creating and running a Pub/Sub publisher using Python
- Creating a Pub/Sub subscription
Let's start by creating a Pub/Sub topic in the next section.
Creating a Pub/Sub topic
We can create Pub/Sub topics using many approaches—for example, using the GCP console, the gcloud
command, or through code. As a starter, let...