Token-based matching
So far, we've explored the sophisticated linguistic concepts that require statistical models and their usages with spaCy. Some NLU tasks can be solved in tricky ways without the help of any statistical model. One of those ways is regex, which we use to match a predefined set of patterns to our text.
A regex (a regular expression) is a sequence of characters that specifies a search pattern. A regex describes a set of strings that follows the specified pattern. A regex can include letters, digits, and characters with special meanings, such as ?, ., and *. Python's built-in library provides great support to define and match regular expressions. There's another Python 3 library called regex that aims wants to replace re in the future.
Readers who are actively developing NLP applications with Python have definitely come across regex code and, even better, have written regex themselves.
What does a regex look like, then? The following regex matches...