Exercise – publishing event streams to Pub/Sub
In this exercise, we will try to stream data from Pub/Sub publishers. The goal is to create a data pipeline that can stream the data to a BigQuery table, but instead of using a scheduler (as we did in Chapter 4, Building Workflows for Batch Data Loading Using Cloud Composer), we will submit a Dataflow job that will run as an application to flow data from Pub/Sub to a BigQuery table. In the exercise, we will use the bike-sharing dataset we used in Chapter 3, Building a Data Warehouse in BigQuery. Here are the overall steps we will cover:
- Creating a Pub/Sub topic.
- Creating and running a Pub/Sub publisher using Python.
- Creating a Pub/Sub subscription.
We’ll start by creating a Pub/Sub topic.
Creating a Pub/Sub topic
We can create Pub/Sub topics using many approaches – for example, using the GCP console, the gcloud
command, or through code. As a starter, let’s use the GCP console. Proceed...