Summary
In this chapter, we covered the main aspects of big data relating to the exam. We covered each service and showed that these can be used at different stages of our end-to-end solution. We took the time to see how we can configure Pub/Sub, Dataflow, and BigQuery from the GCP console and discussed Dataproc and Cloud IoT Core.
Exam Tip
The key takeaway from this chapter is to understand which services map to the ingest, process, and analysis stages of data.
Then, we looked at the processing stage of our solution. Cloud Dataflow will deploy Google Compute Engine instances to deploy and execute our Apache Beam pipeline, which will process data from Pub/Sub and pass it onto further stages for analysis or storage. We have shown how we can easily create a pipeline in the GCP console, which pulls information from Pub/Sub for analysis in BigQuery.
After, we covered BigQuery and understood that it is a data warehouse. It is designed to make data analysts more productive, crunching...