In the previous chapter, we saw the diminishing returns from further training iterations on neural networks in terms of their predictive ability on holdout or test data (that is, data not used to train the model). This is because complex models may memorize some of the noise in the data rather than learning the general patterns. These models then perform much worse when predicting new data. There are some methods we can apply to make our model generalize, that is, fit the overall patterns. These are called regularization and aim to reduce testing errors so that the model performs well on new data.
The most common regularization technique used in deep learning is dropout. However, we will also discuss two other regularization techniques that have a basis in regression and deep learning. These two regularization techniques are L1 penalty...