Regularizing a neural network with L2 regularization
Just like a linear model, whether it be a linear regression or a logistic regression, neural networks have weights. And so, just like a linear model, L2 penalization can be used on those weights to regularize the neural network. In this recipe, we will apply L2 penalization to a neural network on the MNIST handwritten digits dataset.
As a reminder, when training a neural network on this task in Chapter 6, there was a small overfitting after 20 epochs, and the results were an accuracy of 97% on the train set and 95% on the test set. Let’s try to reduce this overfitting by adding L2 regularization in this recipe.
Getting ready
Just like for linear models, L2 regularization is just adding a new L2 term to the loss. Given the weights W=w1,w2,..., the added term to the loss would be . The consequence of this added term to the loss is that the weights are more constrained and must stay close to zero to keep the loss small...