Summary
In this chapter on hyperparameter tuning, you learned about what hyperparameters are, including batch size, learning rate, number of epochs, number of attention heads, sequence length, and more. You learned how to use hyperparameter tuning to improve the performance of your model, along with top strategies for doing so. You learned how to scale up your tuning, starting at 1% of your dataset, then modifying your key hyperparameters as a function of your overall GPU world size. Finally, you learned about key features for doing all of this on Amazon SageMaker.
In the next chapter, we’ll learn about large-scale distributed training!