Testing the fit of the model using cross-validation
Cross-validation provides a reliable estimate of a model’s performance on unseen data. Simulating the model’s performance on multiple subsets of the data reduces the effect of random variations in the training and testing data splits, providing a more realistic assessment of its generalizability.
K-fold cross-validation involves dividing a dataset into K equally-sized subsets, or folds, where K is a predefined number typically chosen between 5 and 10. The original dataset is randomly partitioned into K subsets of approximately equal size (folds), and a model is trained on K-1 folds and evaluated on the fold left out. This means that K-separate model training and evaluation cycles are performed. The performance values from the K iterations are then averaged to obtain a single metric that represents the overall performance.
Leave-one-out (LOO) cross-validation is a variant of cross-validation where the number of...