Distributed training on GPUs with CUDA
Throughout the various exercises in this book, you may have noticed a common line of PyTorch code:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
This code simply looks for the available compute device and prefers cuda
(which uses the GPU) over cpu
. This preference is because of the computational speedups that GPUs can provide on regular neural network operations, such as matrix multiplications and additions through parallelization.
In this section, we will learn how to speed this up further with the help of distributed training on GPUs. We will build upon the work done in the previous exercise. Note that most of the code looks the same. In the following steps, we will highlight the changes. Executing the script has been left to you as an exercise. The full code is available here: https://github.com/PacktPublishing/Mastering-PyTorch/blob/master/Chapter11/convnet_distributed_cuda.py. Let&apos...