Regularization
In this section we will review a few best practices for improving the training phase. In particular, regularization and batch normalization will be discussed.
Adopting regularization to avoid overfitting
Intuitively, a good machine learning model should achieve low error on training data. Mathematically this is equivalent to minimizing the loss function on the training data given the model:
However, this might not be enough. A model can become excessively complex in order to capture all the relations inherently expressed by the training data. This increase in complexity might have two negative consequences. First, a complex model might require a significant amount of time to be executed. Second, a complex model might achieve very good performance on training data but perform quite badly on validation data. This is because the model is able to contrive relationships between many parameters in the specific training context, but these relationships in...