Lazy quantifiers
By default, regex engines will try to match as much as possible when applying a regex. If you matched The number is \d+
against the string The number is 108
, then the entire string would match, as \d+
would be "greedy" and try to match as much as possible (hence matching \d+
against the entire number 108
and not just the first digit).
Sometimes you want to match as little as possible, and that is where lazy quantifiers come in. A lazy quantifier will cause the regex engine to only include the minimum text possible so that a match can be achieved. You make a quantifier lazy by putting a question mark after it. So for example to make the plus quantifier lazy, you write it as +?
. The lazy version of our regex would thus be The number is \d+?
and when matched against The number is 108
, the resulting match would be The number is 1
, as the lazy version of \d+
would be satisfied with a single digit, since that achieves the requirement of the plus quantifier of "one or more".
The following table lists the lazy quantifiers that are available for use.
Quantifier |
Description |
---|---|
+?
|
Lazy plus. |
*?
|
Lazy star. |
??
|
Lazy question mark. |
{min,max}?
|
Lazy range. |
So when are lazy quantifiers needed? One example is if you're trying to extract the first HTML tag from the string This is <b>an example</b> of using bold text
. If you use the regex <.+>
then the resulting match will be <b>an example</b>
, since the regex engine tries to be greedy and match as much as possible. In this case that causes it to keep trying to match after encountering the first >
character, and when it finds the second >
, it concludes that it has matched as much as it can and returns the match.
The solution in this case is to use the lazy version of the plus quantifier, which turns the regex into <.+?>
. This will stop as soon as the first match is found, and so will return <b>
, which is exactly what we wanted.