Classifying unseen test data
The classic supervised machine-learning classification task is to train a classifier on labeled training instances and to apply the classifier on unseen test instances. The key thing to remember here is that the number of attributes in the training set, their types, their names, and their range of values (if they are regular nominal attributes or nominal class attributes) in the training dataset must be exactly the same as those in the test dataset.
Getting ready
It is possible to have a key difference between a training dataset and a testing dataset in Weka. The @DATA
section of an ARFF file in the testing section can look similar to the @DATA
section of an ARFF file. It can have attribute values and class labels as follows:
@DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa
When a classifier is applied on such labeled test data, the classifier ignores the class labels when predicting the class...