Regular expressions
Regular expressions, or regexes, are a powerful form of rule-based matching. Invented back in the 1950s, they were, for a very long time, the most useful way to find things in text and proponents argue that they still are.
No chapter on NLP would be complete without mentioning regexes. With that being said, this section is by no means a complete regex tutorial. It's intended to introduce the general idea and show how regexes can be used in Python, pandas, and spaCy.
A very simple regex pattern could be "a." This would only find instances of the lower-case letter a followed by a dot. However, regexes also allow you to add ranges of patterns; for example, "[a-z]." would find any lower-case letter followed by a dot, and "xy." would find only the letters "x" or "y" followed by a dot.
Regex patterns are case sensitive, so "A-Z" would only capture upper-case letters. This is useful if we are searching for expressions in which the spelling is frequently different; for example,...