Having learned what metrics are used to measure a classification model, we can now study how to measure it properly. We simply cannot adopt the classification results from one fixed testing set as we did in experiments previously. Instead, we usually apply the k-fold cross-validation technique to assess how a model will generally perform in practice.
In the k-fold cross-validation setting, the original data is first randomly divided into k equal-sized subsets, in which class proportion is often preserved. Each of these k subsets is then successively retained as the testing set for evaluating the model. During each trail, the rest k -1 subsets (excluding the one-fold holdout) form the training set for driving the model. Finally, the average performance across all k trials is calculated to generate an overall result.
Statistically, the averaged performance over k-fold cross-validation...