Splitting the dataset for training and testing
Splitting helps to partition the dataset into training and testing sequences.
How to do it...
- Add the following code fragment into the same Python file:
from sklearn import cross_validation from sklearn.naive_bayes import GaussianNB import numpy as np import matplotlib.pyplot as plt in_file = 'data_multivar.txt' a = [] b = [] with open(in_file, 'r') as f: for line in f.readlines(): data = [float(x) for x in line.split(',')] a.append(data[:-1]) b.append(data[-1]) a = np.array(a) b = np.array(b)
- Allocate 75% of data for training and 25% of data for testing:
a_training, a_testing, b_training, b_testing = cross_validation.train_test_split(a, b, test_size=0.25, random_state=5) classification_gaussiannb_new = GaussianNB() classification_gaussiannb_new.fit(a_training, b_training)
- Evaluate the classifier performance on test data:
b_test_pred = classification_gaussiannb_new.predict(a_testing)
- Compute the accuracy of the classifier system:
correctness...