Section 3 – Scaling and Tuning ML Works
Having covered how to set up a training job through various means of TensorFlow Enterprise model development, now is the time to scale the training process by using a cluster of GPUs or TPUs. You will learn how to leverage distributed training strategies and implement hyperparameter tuning to scale and improve your model training experiment.
In this part, you will learn about how to set up GPUs and TPUs in a GCP environment for submitting a model training job in GCP. You also will learn about the latest hyperparameter tuning API and run it at scale using GCP resources.
This section comprises the following chapters:
- Chapter 5, Training at Scale
- Chapter 6, Hyperparameter Tuning