Evaluating the accuracy using cross-validation
Cross-validation is essential in machine learning. Initially, we split the datasets into a train set and a test set. Next, in order to construct a robust classifier, we repeat this procedure, but we need to avoid overfitting the model. Overfitting indicates that we get excellent prediction results for the train set, but very poor results for the test set. Overfitting causes poor generalization of the model.
How to do it...
- Import the packages:
from sklearn import cross_validation from sklearn.naive_bayes import GaussianNB import numpy as np in_file = 'cross_validation_multivar.txt' a = [] b = [] with open(in_file, 'r') as f: for line in f.readlines(): data = [float(x) for x in line.split(',')] a.append(data[:-1]) b.append(data[-1]) a = np.array(a) b = np.array(b) classification_gaussiannb = GaussianNB()
- Compute the accuracy of the classifier:
num_of_validations = 5 accuracy = cross_validation.cross_val_score(classification_gaussiannb...