Implementing named entity recognition
This is sometimes referred to as finding people and things. Given a text segment, we may want to identify all the names of people present. However, this is not always easy because a name such as Rob may also be used as a verb.
In this section, we will demonstrate how to use OpenNLP's TokenNameFinderModel
class to find names and locations in text. While there are other entities we may want to find, this example will demonstrate the basics of the technique. We begin with names.
Most names occur within a single line. We do not want to use multiple lines because an entity such as a state might inadvertently be identified incorrectly. Consider the following sentences:
Jim headed north. Dakota headed south.
If we ignored the period, then the state of North Dakota might be identified as a location, when in fact it is not present.
Using OpenNLP to perform NER
We start our example with a try-catch block to handle exceptions. OpenNLP uses models that have been...