Classifier accuracy
Now we need to test our classifier with a bigger test set; in this case, we will randomly select 100 subjects: 50 spam and 50 not spam. Finally, we will count how many times the classifier chose the correct category:
with open("test.csv") as f: correct = 0 tests = csv.reader(f) for subject in test: clase = classifier(subject[0],w,c,t,tw) if clase[1] =subject[1]: correct += 1 print("Efficiency : {0} of 100".format(correct))
In this case, the Efficiency
is 82
percent:
>>> Efficiency: 82 of 100
Tip
We can use an out of the box implementation of the Naive Bayes classifier, like the NaiveBayesClassifier
function in the NLTK package for Python. NLTK provides a very powerful natural language toolkit and we can download it from http://nltk.org/.
In Chapter 1, Getting Started, we presented a more sophisticated version of the Naïve Bayes classifier to perform a sentiment analysis.
In this case, we will find an optimal size...