Distributed training on GPUs with CUDA
Throughout the various exercises in this book, you may have noticed a common line of PyTorch code:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
This code simply looks for the available compute device and prefers CUDA (GPU) over CPU. The preference is because of the computational speedups that GPUs can provide on regular neural network operations such as matrix multiplications and additions through parallelization. In this section, we look into speeding it up further with the help of distributed training on GPUs. We will develop the work done in the previous exercise. Most of the code looks the same. In the below steps, we will highlight the changes. Executing the script is left for readers as an exercise. The full code is available on GitHub [7]:
- While the imports and model architecture definition code are exactly the same as before, there are a few changes in the
train()
function...