Named entity recognition
A common task in NLP is named entity recognition (NER). NER is all about finding things that the text explicitly refers to. Before discussing more about what is going on, let's jump right in and do some hands-on NER on the first article in our dataset.
The first thing we need to do is load spaCy, in addition to the model for English language processing:
import spacy nlp = spacy.load('en')
Next, we must select the text of the article from our data:
text = df.loc[0,'content']
Finally, we'll run this piece of text through the English language model pipeline. This will create a Doc
instance, something we explained earlier on in this chapter. The file will hold a lot of information, including the named entities:
doc = nlp(text)
One of the best features of spaCy is that it comes with a handy visualizer called displacy
, which we can use to show the named entities in text. To get the visualizer to generate the display, based on the text from our article, we must run this code:
from...