In the preceding section, we learned about applying penalties to the norm of the weights to regularize them, as well as other approaches, such as dataset augmentation and early stopping. However, there is another effective approach that is widely used in practice, known as dropout.
So far, when training neural networks, all the weights have been learned together. However, dropout alters this idea by having the network only learn a fraction of the weights during each iteration. The reason for this is to avoid co-adaptation. This occurs when we train the entire network over all the training data and some connections end up stronger than others, thereby contributing more toward the network's predictive capabilities because the stronger connections overpower the weaker connections, effectively ignoring them. As we train the network with more iterations, some of the...