Text classification is used for many purposes such as determining the type of document, performing sentiment analysis, and spam detection. When a document is encountered, we may be interested in whether it is fiction or nonfiction. Tweets may contain positive or negative comments about a product or song. Spam detection is also another area where text classification can be useful.
In this chapter, we will examine techniques to perform classification and how to train models to address specific problem domains. We will use the OpenNLP, Stanford, and LingPipe NLP libraries to illustrate these classification techniques.
In this chapter, we will cover the following recipes:
- Training a maximum entropy model for text classification
- Classifying documents using a maximum entropy model
- Classifying documents using the Stanford API
- Training a model to classify...