Splitting helps to partition the dataset into training and testing sequences.
Splitting the dataset for training and testing
How to do it...
- Add the following code fragment into the same Python file:
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
import numpy as np
import matplotlib.pyplot as plt
in_file = 'data_multivar.txt'
a = []
b = []
with open(in_file, 'r') as f:
for line in f.readlines():
data = [float(x) for x in line.split(',')]
a.append(data[:-1])
b.append(data[-1])
a = np.array(a)
b = np.array(b)
- Allocate 75% of data for training and 25% of data for testing:
a_training, a_testing, b_training, b_testing = cross_validation.train_test_split(a, b, test_size...