Cross-validation
Cross-validation is a technique that helps data scientists evaluate their models on unseen data. It is helpful when your dataset isn't large enough to create three splits (training, testing, and validation). Cross-validation helps the model avoid overfitting by presenting it with different partitions of the same data. It works by feeding different training and validation sets of the dataset for every pass of cross-validation. 10-fold cross-validation is the most used, where the dataset is divided into 10 completely different subsets and is trained on each one of them, and finally, the metrics are averaged out to obtain the accurate prediction performance of the model. In every round of cross-validation, we do the following:
- Shuffle the dataset and split it into k different groups (k=10 for 10-fold cross-validation).
- Train the model on k-1 groups and test it on 1 group.
- Evaluate the model and store the results.
- Repeat steps 2 and 3 with different groups...