We have not spent a lot of time discussing machine learning on the GCP in this book, but at a very high level, you have two choices:
- TensorFlow and the Cloud ML Engine
- SparkML and Dataproc
Both options are good. The Cloud ML Engine has support for distributed training and prediction and is tightly coupled with TensorFlow, which is a great technology for deep learning. So, this option is probably a better one, on balance.
SparkML is a great option too, though. Spark is possibly the hottest big data technology today; therefore, there are a lot of existing Spark applications and a lot of talented Spark developers out there today. If your organization uses a lot of Spark right now, you might find the SparkML on Dataproc option to be a better one, at least until TensorFlow and the ML Engine catch on in popularity in your firm.
...