From your high school math, you must know that a function's first derivative is zero at its maxima and minima. The gradient descent algorithm is based on the same principle--the coefficients (weights and biases) are adjusted such that the gradient of the loss function decreases. In regression, we use gradient descent to optimize the loss function and obtain coefficients. In this recipe, you will learn how to use the gradient descent optimizer of TensorFlow and some of its variants.
Optimizers in TensorFlow
Getting ready
The update of coefficients (W and b) is done proportionally to the negative of the gradient of the loss function. There are three variations of gradient descent depending on the size of the training sample...