Summary
In this chapter, we covered an important practical aspect of machine learning; that is, how to optimize the model training process. We explored the extent and power of distributed training using PyTorch. First, we discussed distributed training on CPUs. We re-trained the model we trained in Chapter 1, Overview of Deep Learning Using PyTorch, using the principles of distributed training.
While working on this exercise, we learned about some of the useful PyTorch APIs that make distributed training work once we've made a few code changes. Finally, we ran the new training script and observed a significant speedup by distributing the training across multiple processes.
In the second half of this chapter, we briefly discussed distributed training on GPUs using PyTorch. We highlighted the basic code changes needed for model training to work on multiple GPUs in a distributed fashion, while leaving out the actual execution for you as an exercise.
In the next chapter,...