Backreferences
Backreferences are used to capture a part of a regular expression so that it can be referred to later. The regex Hello, my name is (.+)
will capture the name into a variable that can be referred to later. The reason for this is that the .+
construct is surrounded by parentheses.
The name of the variable that the matched text is captured into will differ depending on what regex flavor you are working with. In Perl, for example, regex backreferences are captured into variables named $1, $2, $3
, and so on.
Captures are made left-to-right, with the text within the first parentheses captured into the variable $1
, the second into $2
, and so forth. Capturing can even be made within a set of parenthesis, so the regex My full name is ((\w+) \w+)
would store the complete name (first and last) into $1
and the first name only into $2
.
These are the same kind of parenthesis used for grouping, so grouping using standard parenthesis will also create a backreference. We will however shortly see how to achieve grouping without capturing backreferences.
Captures and ModSecurity
To use captured backreferences in a ModSecurity rule, you specify the capture
action in the rule, which makes the captured backreferences available in the transaction variables TX:1
through TX:9.
The following rule uses a regex that looks for a browser name and version number in the request headers. If found, the version number is captured into the transaction variable TX:1
(which is accessed using the syntax %{TX.1})
and is subsequently logged to the error log file:
SecRule REQUEST_HEADERS:User-Agent "Firefox/(\d\.\d\.\d)" "pass,phase:2,capture,log,logdata:%{TX.1}"
Up to nine captures can be made this way. The transaction variable TX:0
is used to capture the entire regex match, so in the above example, it would contain something like Firefox/3.0.9
.