In previous chapters, we learned how feedforward neural networks are basically a complex function that maps an input to a corresponding target/label by learning the underlying distribution using the training data. We can recall that during training, after an error has been calculated during the forward pass, backpropagation is used to update the parameters in order to reduce the loss and better approximate the data distribution. We also learned about the capacity of neural networks, the bias-variance trade-off, and how neural networks can underfit or overfit to the training data, which prevents it from being able to perform well on unseen data or test data (that is, a generalization error occurs).
Before we get into what exactly regularization is, let's revisit overfitting and underfitting. Neural networks, as we know, are universal function approximators...