Chapter 4: Rule-Based Matching
Rule-based information extraction is indispensable for any NLP pipeline. Certain types of entities, such as times, dates, and telephone numbers have distinct formats that can be recognized by a set of rules, without having to train statistical models.
In this chapter, you will learn how to quickly extract information from the text by matching patterns and phrases. You will use morphological features, POS tags, regex, and other spaCy features to form pattern objects to feed to the Matcher
objects. You will continue with fine-graining statistical models with rule-based matching to lift statistical models to better accuracies.
By the end of this chapter, you will know a vital part of information extraction. You will be able to extract entities of specific formats, as well as entities specific to your domain.
In this chapter, we're going to cover the following main topics:
- Token-based matching
- PhraseMatcher
- EntityRuler
- Combining...