So far, we've seen that the accuracy of the training dataset is typically more than 95%, while the accuracy of the validation dataset is ~89%.
Essentially, this indicates that the model does not generalize as much on unseen datasets since it can learn from the training dataset. This also indicates that the model is learning all the possible edge cases for the training dataset; these can't be applied to the validation dataset.
Having high accuracy on the training dataset and considerably lower accuracy on the validation dataset refers to the scenario of overfitting.
Some of the typical strategies that are employed to reduce the effect of overfitting are as follows:
- Dropout
- Regularization
We will look at what impact they have in the following sections.
Impact of adding dropout
We have already learned that whenever loss.backward() is calculated, a weight update happens. Typically, we would have hundreds of thousands of parameters within a network and...