In the previous chapter, we learned about (deep) feedforward neural networks and how they are structured. We learned how these architectures can leverage their hidden layers and non-linear activations to learn to perform well on some very challenging tasks, which linear models aren't able to do. We also saw that neural networks tend to overfit to the training data by learning noise in the dataset, which leads to errors in the testing data. Naturally, since our goal is to create models that generalize well, we want to close the gap so that our models perform just as well on both datasets. This is the goal of regularization—to reduce test error, sometimes at the expense of greater training error.
In this chapter, we will cover a variety of methods used in regularization, how they work, and why certain techniques are preferred over others. This includes...