Introduced in Dropout: A Simple Way to Prevent Neural Networks from Overfitting (JMLR, 2014) by Hinton and his team (who made numerous contributions to deep learning), dropout consists of randomly disconnecting (dropping out) some neurons of target layers at every training iteration. This method thus takes a hyperparameter ratio, , which represents the probability that neurons are being turned off at each training step (usually set between 0.1 and 0.5). The concept is illustrated in Figure 3.13:
By artificially and randomly impairing the network, this method forces the learning of robust and concurrent features. For instance, as dropout may deactivate the neurons responsible for a key feature, the network has to figure out other significant features in order to reach the same prediction. This has the effect of developing...