Model validation strategy
To validate our models, we can use separate datasets or split the dataset we have into training and validation sets using different techniques, as explained in Table 4.5 and illustrated in Figure 4.11. In cross-validation strategies, we split the data into different subsets, then the performance score or error for each subset, since the validation set is calculated using the predictions of the model trained on the rest of the data. Then, we can use the mean of the performance across the subsets as the cross-validation performance:
Figure 4.11 – Techniques for separating the validation and training sets within one dataset
Each of these validation techniques has its advantages and limitations. Using cross-validation techniques instead of hold-out validation has the benefit of covering all or the majority of the data in at least one validation subset. Stratified k-fold cross-validation (CV) is also a better choice compared...