Regularization and dropout
Overfitting is a common issue in deep models. Their extremely high capacity can often become problematic even with very large datasets because the ability to learn the structure of the training set is not always related to the ability to generalize. A deep neural network can easily become an associative memory, but the final internal configuration couldn't be the most suitable to manage samples belonging to the same distribution but was never presented during the training process. It goes without saying that this behavior is proportional to the complexity of the separation hypersurface. A linear classifier has a minimum chance to overfit, and a polynomial classifier is incredibly more prone to do it. A combination of hundreds, thousands, or more non-linear functions yields a separation hypersurface, which is beyond any possible analysis. In 1991, Hornik (in Approximation Capabilities of Multilayer Feedforward Networks,Hornik K., Neural Networks, 4/2) generalized...