Maximum entropy is a statistical technique that can be used to classify documents. In this recipe, we will use OpenNLP to demonstrate this approach. Specifically, we will use the OpenNLP DocumentCategorizerME class. In the next recipe, Classifying documents using a maximum entropy model, we will demonstrate the use of this model.
In order to train the model, we will need a set of training data. We will use a set of data to differentiate between text that relates to frogs and one that relates to rats.