Understanding the Python regex engine
The re
module uses a backtracking regular expression engine; although, in the very well-known book Mastering Regular Expressions by Jeffrey E. F. Friedl, it is classified as Nondeterministic Finite Automata (NFA) type. Also, according to Tim Peters (https://mail.python.org/pipermail/tutor/2006-January/044335.html), the module is not purely NFA.
These are the most common characteristics of the algorithm:
It supports "lazy quantifiers" such as
*?
,+?
, and??
.It matches the first coincidence, even though there are longer ones in the string.
>>>re.search("engineer|engineering", "engineering").group()'engineer'
This also means that order is important.
The algorithm tracks only one transition at one step, which means that the engine checks one character at a time.
Backreferences and capturing parentheses are supported.
Backtracking is the ability to remember the last successful position so that it can go back and retry if needed
In the worst case, complexity...