Optimizing accelerator performance
There are two ways of approaching this, and both of them are important. The first is from a hyperparameter perspective. The second is from an infrastructure perspective. Let’s break them down!
Hyperparameters
All of Chapter 7 is devoted to picking the right hyperparameters, and optimizing GPU performance is a large driver for that. Importantly, as the number of GPUs changes in your cluster, what we call your world size, you’ll need to modify your hyperparameters to accommodate that change. Also, there’s a core trade-off between increasing your overall job throughput, say by maxing out your batch size, and finding a smaller batch size, which ultimately will give you higher accuracy. Later in the book, you’ll learn how to use hyperparameter tuning to bridge that gap.
Infrastructure optimizations for accelerators on AWS
Here, you’re going to learn about five key topics that can determine how well your scripts...