Overview of spaCy conventions
Every NLP application consists of several steps of processing the text. As you can see in the first chapter, we have always created instances called nlp
and doc
. But what did we do exactly?
When we call nlp
on our text, spaCy applies some processing steps. The first step is tokenization to produce a Doc
object. The Doc
object is then processed further with a tagger, a parser, and an entity recognizer. This way of processing the text is called a language processing pipeline. Each pipeline component returns the processed Doc
and then passes it to the next component:
A spaCy pipeline object is created when we load a language model. We load an English model and initialize a pipeline in the following code segment:
import spacy nlp = spacy.load("en_core_web_md") doc = nlp("I went there")
What happened exactly in the preceding code is as follows...