A brief introduction
Since v1.5, Python provides a Perl-style regular expression with some subtle exceptions that we will see later. Both patterns and strings to be searched can be Unicode strings, as well as an 8-bit string (ASCII).
Tip
Unicode is the universal encoding with more than 110.00 characters and 100 scripts to represent all the world's living characters and even historic scripts. You can think of it as a mapping between numbers, or code points as they are called, and characters. So, we can represent every character, no matter in what language, with one single number. For example, the character is the number 26159, and it is represented as \u662f (hexadecimal) in Python.
Regular expressions are supported by the re
module. So, as with all modules in Python, we only need to import it to start playing with them. For that, we need to start the Python interactive shell using the following line of code:
>>> import re
Once we have imported the module, we can start trying to match...