Understanding gradient descent
A good way to think about loss for a deep learning model is that it exists in a three-dimensional loss landscape that has many different hills and valleys, with valleys being more optimal, as shown in Figure 2.4.
Figure 2.4 – An example loss landscape
In reality, however, we can only approximate these loss landscapes as the parameter values of the neural networks can exist in an infinite number of ways. The most common way practitioners use to monitor the behavior of loss during each epoch of training and validation is to simply plot a two-dimensional line graph with the x axis being the epochs executed and the y axis being the loss performance. An epoch is a single iteration through the entire dataset during the training process of a neural network. The loss landscape in Figure 2.4 is an approximation of the loss landscape in three dimensions of a neural network. To visualize the three-dimensional loss landscape in...