Building our streaming data pipeline
Sticking with the theme of planning our trip to New York City, the activities in the previous sections of this chapter gave us some good insights into what kinds of accommodation options are available to us, and now we want to assess our transportation options; specifically, how much it’s likely to cost us to travel around in taxis while we’re there.
Our streaming data pipeline will take input data from Google Cloud Pub/Sub, perform some processing in Dataflow, and place the outputs into BigQuery for analysis. The architecture of our pipeline on Google Cloud is shown in Figure 6.8:
Figure 6.8: Streaming data pipeline
Google Cloud provides a public stream of data that can be used to test these kinds of stream processing workloads, which contains information relating to New York City taxi rides, and which we will use in our example in this section. Let’s start by creating a destination for our...