Creating a Dataflow pipeline to store streaming data
Google Dataflow is a service for stream and batch processing at scale. When there is a need for processing lots of streamed data like click stream or data from IoT devices, Dataflow will be the starting point for receiving all the stream data. The data can then be sent to storage (BigQuery, Bigtable, GCS) for further processing (ML):
For this recipe, let's consider a weather station (IoT device) that is sending temperature data to GCP. The data is emitted constantly by the IoT device and is stored on Google Storage for later analytics processing. Considering the intermittent nature of data connectivity between the device and GCP, we'll need a solution to receive the messages, process/handle them, and store them. For this solution, we'll create a Dataflow pipeline using a Google provided template—Cloud Pub/Sub to Cloud Storage text.
Getting ready
The following are the initial setup verification steps for the creation of the network before...