Alternation
Sometimes you want to match one of several phrases. For example, maybe you want to match against Monday
written in one of several languages. The pipe character |
can be used for this purpose, in the following manner:
Monday|Montag|Lundi
This regex matches either one of Monday, Montag
, and Lundi
. The pipe character is what makes each of the words an alternative—it can be thought of as an "or" construct if you are familiar with programming.
So how far does alternation reach? In the regex I remember the day, it was a Monday|Montag|Lundi
, does the first alternative refer to Monday, it was a Monday
, or something else? The answer is that the first alternative will be the entire first part of the sentence, namely I remember the day, it was a Monday
.
This is obviously not what we want from this regex, so we need a way to constrain what the alternation matches. This is done by using parentheses, in the following way:
I remember the day, it was a (Monday|Montag|Lundi)
The parentheses in this regex make sure that the alternation only applies within the parentheses, so the first alternative will be restricted to Monday
and not anything without the parentheses. Similarly, the last alternative will be Lundi
only and will not include anything following it.