Why is NER difficult?
Like many NLP tasks, NER is not always simple. Although the tokenization of a text will reveal its components, understanding what they are can be difficult. Using proper nouns will not always work because of the ambiguity of language. For example, Penny and Faith, while valid names, may also be used for a measurement of currency and a belief, respectively. We can also find words such as Georgia that are used as the name of a country, a state, and a person. We can also not make a list of all people or places or entities as they are not predefined. Consider the following two simple sentences:
- Jobs are harder to find nowadays
- Jobs said dots will always connect
In these two sentences, jobs seems to be the entity but they are not related, and in second sentence it's not even an entity. We need to use some complex techniques to check for the occurrence of entities in the context. Sentences may use the same entity's name in different ways. Say, for example, IBM and International...