Cross-site scripting attacks occur when user input is not properly sanitized and ends up in pages sent back to users. This makes it possible for an attacker to include malicious scripts in a page by providing them as input to the page. The scripts will be no different than scripts included in pages by the website creators, and will thus have all the privileges of an ordinary script within the page—such as the ability to read cookie data and session IDs. In this article we will look in more detail on how to prevent attacks.
The name "cross-site scripting" is actually rather poorly chosen—the name stems from the first such vulnerability that was discovered, which involved a malicious website using HTML framesets to load an external site inside a frame. The malicious site could then manipulate the loaded external site in various ways—for example, read form data, modify the site, and basically perform any scripting action that a script within the site itself could perform. Thus cross-site scripting, or XSS, was the name given to this kind of attack.
The attacks described as XSS attacks have since shifted from malicious frame injection (a problem that was quickly patched by web browser developers) to the class of attacks that we see today involving unsanitized user input. The actual vulnerability referred to today might be better described as a "malicious script injection attack", though that doesn't give it quite as flashy an acronym as XSS. (And in case you're curious why the acronym is XSS and not CSS, the simple explanation is that although CSS was used as short for cross-site scripting in the beginning, it was changed to XSS because so many people were confusing it with the acronym used for Cascading Style Sheets, which is also CSS.)
Cross-site scripting attacks can lead not only to cookie and session data being stolen, but also to malware being downloaded and executed and injection of arbitrary content into web pages.
Cross-site scripting attacks can generally be divided into two categories:
The most important measure you can take to prevent XSS attacks is to make sure that all user-supplied data that is output in your web pages is properly sanitized. This means replacing potentially unsafe characters, such as angled brackets (< and >) with their corresponding HTML-entity encoded versions—in this case < and >.
Here is a list of characters that you should encode when present in user-supplied data that will later be included in web pages:
Character | HTML-encoded version |
< | < |
> | > |
( | ( |
) | ) |
# | # |
& | & |
" | " |
' | ' |
In PHP, you can use the htmlentities() function to achieve this. When encoded, the string <script> will be converted into <script>. This latter version will be displayed as <script> in the web browser, without being interpreted as the start of a script by the browser.
In general, users should not be allowed to input any HTML markup tags if it can be avoided. If you do allow markup such as <a href="..."> to be input by users in blog comments, forum posts, and similar places then you should be aware that simply filtering out the <script> tag is not enough, as this simple example shows:
<a href="http://www.google.com" onMouseOver="javascript:alert('XSS Exploit!')">Innocent link</a>
This link will execute the JavaScript code contained within the onMouseOver attribute whenever the user hovers his mouse pointer over the link. You can see why even if the web application replaced <script> tags with their HTML-encoded version, an XSS exploit would still be possible by simply using onMouseOver or any of the other related events available, such as onClick or onMouseDown.
I want to stress that properly sanitizing user input as just described is the most important step you can take to prevent XSS exploits from occurring. That said, if you want to add an additional line of defense by creating ModSecurity rules, here are some common XSS script fragments and regular expressions for blocking them:
Script fragment | Regular expression |
<script | <script |
eval( | evals*( |
onMouseOver | onmouseover |
onMouseOut | onmouseout |
onMouseDown | onmousedown |
onMouseMove | onmousemove |
onClick | onclick |
onDblClick | ondblclick |
onFocus | onfocus |
You may have seen the ModSecurity directive SecPdfProtect mentioned, and wondered what it does. This directive exists to protect users from a particular class of cross-site scripting attack that affects users running a vulnerable version of the Adobe Acrobat PDF reader.
A little background is required in order to understand what SecPdfProtect does and why it is necessary. In 2007, Stefano Di Paola and Giorgio Fedon discovered a vulnerability in Adobe Acrobat that allows attackers to insert JavaScript into requests, which is then executed by Acrobat in the context of the site hosting the PDF file. Sound confusing? Hang on, it will become clearer in a moment.
The vulnerability was quickly fixed by Adobe in version 7.0.9 of Acrobat. However, there are still many users out there running old versions of the reader, which is why preventing this sort of attack is still an ongoing concern.
The basic attack works like this: An attacker entices the victim to click a link to a PDF file hosted on www.example.com. Nothing unusual so far, except for the fact that the link looks like this:
http://www.example.com/document.pdf#x=javascript:alert('XSS');
Surprisingly, vulnerable versions of Adobe Acrobat will execute the JavaScript in the above link. It doesn't even matter what you place before the equal sign, gibberish= will work just as well as x= in triggering the exploit.
Since the PDF file is hosted on the domain www.example.com, the JavaScript will run as if it was a legitimate piece of script within a page on that domain. This can lead to all of the standard cross-site scripting attacks that we have seen examples of before.
This diagram shows the chain of events that allows this exploit to function:
The vulnerability does not exist if a user downloads the PDF file and then opens it from his local hard drive.
ModSecurity solves the problem of this vulnerability by issuing a redirect for all PDF files. The aim is to convert any URLs like the following:
http://www.example.com/document.pdf#x=javascript:alert('XSS');
into a redirected URL that has its own hash character:
http://www.example.com/document.pdf#protection
This will block any attacks attempting to exploit this vulnerability. The only problem with this approach is that it will generate an endless loop of redirects, as ModSecurity has no way of knowing what is the first request for the PDF file, and what is a request that has already been redirected. ModSecurity therefore uses a one-time token to keep track of redirect requests. All redirected requests get a token included in the new request string. The redirect link now looks like this:
http://www.example.com/document.pdf?PDFTOKEN=XXXXX#protection
ModSecurity keeps track of these tokens so that it knows which links are valid and should lead to the PDF file being served. Even if a token is not valid, the PDF file will still be available to the user, he will just have to download it to the hard drive.
These are the directives used to configure PDF XSS protection in ModSecurity:
SecPdfProtect On SecPdfProtectMethod TokenRedirection SecPdfProtectSecret "SecretString" SecPdfProtectTimeout 10 SecPdfProtectTokenName "token"
The above configures PDF XSS protection, and uses the secret string SecretString to generate the one-time tokens. The last directive, SecPdfProtectTokenName, can be used to change the name of the token argument (the default is PDFTOKEN). This can be useful if you want to hide the fact that you are running ModSecurity, but unless you are really paranoid it won't be necessary to change this.
The SecPdfProtectMethod can also be set to ForcedDownload, which will force users to download the PDF files instead of viewing them in the browser. This can be an inconvenience to users, so you would probably not want to enable this unless circumstances warrant (for example, if a new PDF vulnerability of the same class is discovered in the future).
One mechanism to mitigate the impact of XSS vulnerabilities is the HttpOnly flag for cookies. This extension to the cookie protocol was proposed by Microsoft (see http://msdn.microsoft.com/en-us/library/ms533046.aspx for a description), and is currently supported by the following browsers:
HttpOnly cookies work by adding the HttpOnly flag to cookies that are returned by the server, which instructs the web browser that the cookie should only be used when sending HTTP requests to the server and should not be made available to client-side scripts via for example the document.cookie property. While this doesn't completely solve the problem of XSS attacks, it does mitigate those attacks where the aim is to steal valuable information from the user's cookies, such as for example session IDs.
A cookie header with the HttpOnly flag set looks like this:
Set-Cookie: SESSID=d31cd4f599c4b0fa4158c6fb; HttpOnly
HttpOnly cookies need to be supported on the server-side for the clients to be able to take advantage of the extra protection afforded by them. Some web development platforms currently support HttpOnly cookies through the use of the appropriate configuration option. For example, PHP 5.2.0 and later allow HttpOnly cookies to be enabled for a page by using the following ini_set() call:
<?php ini_set("session.cookie_httponly", 1); ?>
Tomcat (a Java Servlet and JSP server) version 6.0.19 and later supports HttpOnly cookies, and they can be enabled by modifying a context's configuration so that it includes the useHttpOnly option, like so:
<Context> <Manager useHttpOnly="true" /> </Context>
In case you are using a web platform that doesn't support HttpOnly cookies, it is actually possible to use ModSecurity to add the flag to outgoing cookies. We will see how to do this now.
Assuming we want to add the HttpOnly flag to session identifier cookies, we need to know which cookies are associated with session identifiers. The following table lists the name of the session identifier cookie for some of the most common languages:
Language | Session identifier cookie name |
PHP | PHPSESSID |
JSP | JSESSIONID |
ASP | ASPSESSIONID |
ASP.NET | ASP.NET_SessionId |
The table shows us that a good regular expression to identify session IDs would be (sessionid|sessid), which can be shortened to sess(ion)?id. The web programming language you are using might use another name for the session cookie. In that case, you can always find out what it is by looking at the headers returned by the server:
echo -e "GET / HTTP/1.1nHost:yourserver.comnn"|nc yourserver.com 80|head
Look for a line similar to:
Set-Cookie: JSESSIONID=4EFA463BFB5508FFA0A3790303DE0EA5; Path=/
This is the session cookie—in this case the name of it is JESSIONID, since the server is running Tomcat and the JSP web application language.
The following rules are used to add the HttpOnly flag to session cookies:
# # Add HttpOnly flag to session cookies # SecRule RESPONSE_HEADERS:Set-Cookie "!(?i:HttpOnly)" "phase:3,chain,pass" SecRule MATCHED_VAR "(?i:sess(ion)?id)" "setenv:session_ cookie=%{MATCHED_VAR}" Header set Set-Cookie "%{SESSION_COOKIE}e; HttpOnly" env=session_ cookie
We are putting the rule chain in phase 3—RESPONSE_HEADERS, since we want to inspect the response headers for the presence of a Set-Cookie header. We are looking for those Set-Cookie headers that do not contain an HttpOnly flag. The (?i: ) parentheses are a regular expression construct known as a mode-modified span. This tells the regular expression engine to ignore the case of the HttpOnly string when attempting to match. Using the t:lowercase transform would have been more complicated, as we will be using the matched variable in the next rule, and we don't want the case of the variable modified when we set the environment variable.