Estimating model performance with k-fold cross-validation
The k-fold cross-validation technique is a common technique used to estimate the performance of a classifier as it overcomes the problem of over-fitting. For k-fold cross-validation, the method does not use the entire dataset to build the model; instead it splits the data into a training dataset and a testing dataset. Therefore, the model built with a training dataset can then be used to assess the performance of the model on the testing dataset. By performing n repeats of the k-fold validation, we can then use the average of n accuracies to truly assess the performance of the built model. In this recipe, we will illustrate how to perform a k-fold cross-validation.
Getting ready
In this recipe, we will continue to use the telecom churn
dataset as the input data source to train the support vector machine. For those who have not prepared the dataset, please refer to Chapter 7, Classification 1 - Tree, Lazy, and Probabilistic, for detailed...