We previously presented how the parameters, P, of a neural network (that is, all the weight and bias parameters of its layers) can be iteratively updated during training to minimize the loss, L, backpropagating its gradient. If this gradient descent process could be summarized in a single equation, it would be the following:
is the learning rate hyperparameter, which accentuates or attenuates how the network's parameters are updated with regard to the gradient of the loss at every training iteration. While we mentioned that the learning rate value should be set with care, we did not explain how and why. The reasons for caution in this setup are threefold.