During training, we know that our neural networks (which have sufficient capacity to learn the training data) have a tendency to overfit to the training data over many iterations, and then they are unable to generalize what they have learned to perform well on the test set. One way of overcoming this problem is to plot the error on the training and test sets at each iteration and analytically look for the iteration where the error from the training and test sets is the closest. Then, we choose those parameters for our model.
Another advantage of this method is that this in no way alters the objective function in the way that parameter norms do, which makes it easy to use and means it doesn't interfere with the network's learning dynamics, which is shown in the following diagram:
However, this approach isn't perfect—it does have a downside...