Now that we have an understanding of backpropagation and how gradients are computed, you might be wondering what purpose it serves and what it has to do with training our MLP. If you will recall from Chapter 1, Vector Calculus, when we covered partial derivatives, we learned that we can use partial derivatives to check the impact that changing one parameter can have on the output of a function. When we use the first and second derivatives to plot our graphs, we can analytically tell what the local and global minima and maxima are. However, it isn't as straightforward as that in our case as our model doesn't know where the optima is or how to get there; so, instead, we use backpropagation with the gradient descent as a guide to help us get to the (hopefully global) minima.
In Chapter 4, Optimization, we learned about gradient descent and how we...