In some of the previous recipes, we have noticed that the training accuracy is ~100%, while test accuracy is ~98%, which is a case of overfitting on top of a training dataset. Let's gain an intuition of the delta between the training and the test accuracies.
To understand the phenomenon resulting in overfitting, let's contrast two scenarios where we compare the training and test accuracies along with a histogram of the weights:
- Model is run for five epochs
- Model is run for 100 epochs
The comparison-of-accuracy metric between training and test datasets between the two scenarios is as follows:
Scenario |
Training dataset |
Test dataset |
5 epochs |
97.59% |
97.1% |
100 epochs |
100% |
98.28% |
Once we plot the histogram of weights that are connecting the hidden layer to the output layer, we will notice that the 100...