Processing Streaming Data with Pub/Sub and Dataflow
Processing streaming data is becoming increasingly popular since this enables businesses to get real-time metrics on business operations. In this chapter, we will understand which paradigm should be used – and when – for streaming data. We will also learn how to apply transformations to streaming data using Cloud Dataflow, as well as how to store processed records in BigQuery for analysis.
Learning about streaming data is easier when we do it, so we will complete some exercises where we will create a streaming data pipeline on Google Cloud Platform (GCP). We will use two GCP services, Pub/Sub and Dataflow. Both services are essential in creating a streaming data pipeline. At the end of this chapter, we will compare how similar and different streaming is to the batch approach that we learned about in Chapter 5, Building a Data Lake Using Dataproc.
Here are the topics that we will discuss in this chapter:
-
...