Trying different splitting strategies
As previously discussed, the validation loss is based on a data sample that is not part of the training set. It is an empirical measure that tells you how good your model is at predicting, and a more correct one than the score you get from your training, which will tell you mostly how much your model has memorized the training data patterns. Correctly choosing the data sample you use for validation constitutes your validation strategy.
To summarize the strategies for validating your model and measuring its performance correctly, you have a couple of choices:
- The first choice is to work with a holdout system, incurring the risk of not properly choosing a representative sample of the data or overfitting to your validation holdout.
- The second option is to use a probabilistic approach and rely on a series of samples to draw your conclusions on your models. Among the probabilistic approaches, you have cross-validation, leave-one...