Classifier accuracy
Now we need to test our classifier with a bigger test set. In this case, we will randomly select 100 subjects; 50 spam and 50 not spam. Finally, we will count how many times the classifier chose the correct category:
with open("test.csv") as f: correct = 0 tests = csv.reader(f) for subject in test: clase = classifier(subject[0],w,c,t,tw) if clase[1] =subject[1]: correct += 1 print("Efficiency : {0} of 100".format(correct))
In this case, the efficiency is 82 percent:
>>> Efficiency: 82 of 100
Tip
We can find out of the box implementations of Naïve Bayes classifier such as the NaiveBayesClassifier
function in the NLTK package for Python. NLTK provides a very powerful natural language toolkit and we can download it from http://nltk.org/.
In Chapter 11, Sentiment Analysis of Twitter Data, we present a more sophisticated version of Naïve Bayes classifier to perform a sentiment analysis.
In this case, we will find an optimal-size threshold for the...