Hyperparameters – batch size, learning rate, and more
Hyperparameters determine a huge majority of critical decision points in deep learning. They operate like an intermediary between you, your model, your dataset, and your overall compute environment. You’ll pick up terms such as batch size, learning rate, number of attention heads, and more to balance your overall solution to the problem at hand, balance costs, and ensure optimal performance of your model during both training and inference.
Batch size tells your training algorithm literally how many objects from your dataset to pick up into memory for each training step. Basic physics tells us that if you pick up more objects than your GPU can hold in memory at a single time, you’ll hit an Out of Memory
error. A large batch size helps you step through your training loop quickly but runs the risk of failing to capture all the variation in your dataset if you do not run the optimizer frequently enough. This...