Optimizers define how a neural network learns. They define the value of parameters during the training such that the loss function is at its lowest.
Gradient descent is an optimization algorithm for finding the minima of a function or the minimum value of a cost function. This is useful to us as we want to minimize the cost function. So, to find the local minimum, we take steps proportional to the negative of the gradient.
Let's go through a very simple example in one dimension, shown in the following plot:
On the y axis, we have the cost (the result of the cost function), and on the x axis, we have the particular weight we are trying to choose (we chose the random weight). The weight minimizes the cost function and we can see that, basically, the parameter value is at the bottom of the parabola. We have to minimize the value of the cost function to the minimum value. Finding the minimum is really...