If you have run the previous experiment, you may have realized that:
- Both the validation and test results vary, as their samples are different.
- The chosen hypothesis is often the best one, but this is not always the case.
Unfortunately, relying on the validation and testing phases of samples brings uncertainty, along with a reduction of the learning examples dedicated to training (the fewer the examples, the more the variance of the estimates from the model).
A solution would be to use cross-validation, and Scikit-learn offers a complete module for cross-validation and performance evaluation (sklearn.model_selection).
By resorting to cross-validation, you'll just need to separate your data into a training and test set, and you will be able to use the training data for both model optimization and model training.
How does cross-validation work? The idea is...