Gradient descent is one of the most popular and widely used optimization algorithms, and is a first-order optimization algorithm. First-order optimization means that we calculate only the first-order derivative. As we saw in Chapter 1, Introduction to Deep Learning, we used gradient descent and calculated the first-order derivative of the loss function with respect to the weights of the network to minimize the loss.
Gradient descent is not only applicable to neural networks—it is also used in situations where we need to find the minimum of a function. In this chapter, we will go deeper into gradient descent, starting with the basics, and learn several variants of gradient descent algorithms. There are various flavors of gradient descent that are used for training neural networks. First, we will understand Stochastic Gradient Descent (SGD...