Cross-validation is essential in machine learning. Initially, we split the datasets into a train set and a test set. Next, in order to construct a robust classifier, we repeat this procedure, but we need to avoid overfitting the model. Overfitting indicates that we get excellent prediction results for the train set, but very poor results for the test set. Overfitting causes poor generalization of the model.
Evaluating the accuracy using cross-validation
How to do it...
- Import the packages:
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
import numpy as np
in_file = 'cross_validation_multivar.txt'
a = []
b = []
with open(in_file, 'r') as f:
for line in f.readlines():
data...