Cross-Validation
In cross-validation, also known as CV, the training data is split into five folds (any number will do, but five is standard). The machine learning algorithm is fit on one fold at a time and tested on the remaining data. The result is five different training and test sets that are all representative of the same data. The mean of the scores is usually taken as the accuracy of the model.
Note
Five is only one suggestion. Any natural number may be used.
Cross-validation is a core tool for machine learning. Mean test scores on different folds will always be more reliable than one mean test score on the entire set, which we performed in the first exercise. When examining one test score, there is no way of knowing whether it is low or high. Five test scores give a better picture of the accuracy of the model.
Cross-validation can be implemented in a variety of ways. A standard approach is to use cross_val_score
, which returns an array of scores for each fold; cross_val_score...