Mini-Batch SGD with PyTorch
Let's recap what we have learned so far. We started by implementing a gradient descent algorithm in NumPy. Then we were introduced to PyTorch, a modern deep learning library. We implemented an improved version of the gradient descent algorithm in PyTorch in the last exercise. Now let's dig into more details about gradient descent.
There are three types of gradient descent algorithms:
- Batch gradient descent
- Stochastic gradient descent
- Mini-batch stochastic gradient descent
While batch gradient descent computes model parameter' gradients using the entire dataset, stochastic gradient descent computes model parameter' gradients using a single sample in the dataset. But using a single sample to compute gradients is very unreliable and the estimated gradients are extremely noisy. So, most applications of stochastic gradient descent use more than one sample, or a mini-batch of a handful of samples, to compute gradients...