Summary
In this chapter, you learned how to ingest data into Google Cloud from various sources, and you discovered important concepts on how to process data in Google Cloud.
You then learned about exploring and visualizing data using Vertex AI and BigQuery. Next, you learned how to clean and prepare data for ML workloads using Jupyter notebooks, and then how to create an automated data pipeline to perform the same transformations at a production scale in a batch method using Apache Spark on Google Cloud Dataproc, as well as how to automatically orchestrate that entire process using Apache Airflow in GCC.
We then covered important concepts and tools related to processing streaming data, and you finally built your own streaming data processing pipelines using Apache Beam on Google Cloud Dataflow.
In the next chapter, we will spend additional time on data processing and preparation, with a specific focus on the concept of feature engineering.