Training a classifier with NLTK-Trainer
In this recipe, we'll cover the train_classifier.py
script from NLTK-Trainer, which lets you train NLTK classifiers from the command line. NLTK-Trainer was previously introduced at the end of Chapter 4, Part-of-speech Tagging, and again at the end of Chapter 5, Extracting Chunks.
Note
You can find NLTK-Trainer at https://github.com/japerk/nltk-trainer and the online documentation at http://nltk-trainer.readthedocs.org/.
How to do it...
Like train_tagger.py
and train_chunker.py
, the only required argument for train_classifier.py
is the name of a corpus. The corpus must have a categories()
method, because text classification is all about learning to classify categories. Here's an example of running train_classifier.py
on the movie_reviews
corpus:
$ python train_classifier.py movie_reviews loading movie_reviews 2 labels: ['neg', 'pos'] using bag of words feature extraction 2000 training feats, 2000 testing feats training NaiveBayes classifier accuracy: 0...