When you intend to expand your business, parallel processing becomes essential for streaming, querying large datasets, and so on, and machine learning becomes important for analytics. In the case of GCP, Dataproc is a managed and cost-effective solution for Apache Spark and Hadoop workloads.
In this chapter, you will learn about the following:
- Google Cloud Dataproc
- Dataproc cluster instances
- Running jobs on Dataproc clusters
- Scaling Dataproc clusters
- Deleting clusters