Named entity recognition
Named entity recognition in a sub process in the natural language processing pipeline. We identify the names and numbers from the input document. The names can be names of a person or company, location numbers can be money or percentages, to name a few. In order to perform named entity recognition, we will use Apache OpenNLP TokenNameFinderModel
API. In order to invoke the code from the R environment, we will use the OpenNLP R package:
Load the required libraries:
library(rJava) library(NLP) library(openNLP)
Create a sample text; we will extract the entities from this text:
txt <- " IBM is an MNC with headquarters in New York. Oracle is a cloud company in California. James works in IBM. Oracle hired John for cloud expertise. They give 100% to their profession"
We will convert it to string for processing:
txt_str <- as.String(txt)
We will process the text through the
MaxEnt
sentence token annotator and theMaxEnt
word token annotator, both available in r packages and...