We performed lemmatization on the sentence The girls were leaving the clubhouse for another adventurous afternoon. A LemmatizerModel was declared and instantiated from the en-lemmatizer.bin file. A try-with-resources block was used to obtained an input stream for the file, as shown in the following code:
LemmatizerModel lemmatizerModel = null;
try (InputStream modelInputStream = new FileInputStream(
"C:\\Downloads\\OpenNLP\\en-lemmatizer.bin")) {
lemmatizerModel = new LemmatizerModel(modelInputStream);
Next, the lemmatizer was created using the LemmatizerME class, as shown in the following code:
LemmatizerME lemmatizer = new LemmatizerME(lemmatizerModel);
The following sentence was processed, and is represented as an array of strings. We also need an array of POS tags for the lemmatization process to work. This array was defined in parallel with the sentence array. As we will see in Chapter 4, Detecting POS Using Neural Networks, there are often alternative tags that are possible for a sentence. For this example, we used tags generated by the Cognitive Computation Group's online tool at http://cogcomp.org/page/demo_view/pos:
String[] tokens = new String[] {
"The", "girls", "were", "leaving", "the",
"clubhouse", "for", "another", "adventurous",
"afternoon", "." };
String[] posTags = new String[] { "DT", "NNS", "VBD",
"VBG", "DT", "NN", "IN", "DT", "JJ", "NN", "." };
The lemmatization then occurred, where the lemmatize method uses the two arrays to build an array of lemmas for each word in the sentence, as shown in the following code:
String[] lemmas = lemmatizer.lemmatize(tokens, posTags);
The lemmas are then displayed, as shown in the following code:
for (int i = 0; i < tokens.length; i++) {
System.out.println(tokens[i] + " - " + lemmas[i]);
}