Creating an NER pipeline
NER is an information extraction technique that recognizes entities in text and puts them in certain categories, such as person, location, and organization. For example, say we have the following phrase:
If you use spaCy's en_core_web_lg
against this phrase, you will get the following results:
Snap Inc. - 0 - 9 - ORG - Companies, agencies, institutions, etc. First Quarter 2021 - 20 - 38 - DATE - Absolute or relative dates or periods
Name recognition can be useful in a variety of tasks. In this section, we will use it to retrieve the main characters of The Legend of Sleepy Hollow.
Here is what our NER pipeline specification will look like:
--- pipeline: name: ner description: A NER pipeline input: pfs: glob: "/text.txt" repo: data-clean transform: cmd...