Context

Published in PHP Architect on 22 Dec 2005

A textbook definition of context is “the circumstances in which an event occurs; a setting.” When speaking about PHP security, the context of your data is important. This is better explained by example:

  1. <?php
  2.  
  3. $username = 'chris';
  4.  
  5. $filename =
  6.   "http://host/profile.php?user=$username";
  7.  
  8. $contents = file_get_contents($filename);
  9.  
  10. ?>

Let’s examine each line of code individually:

  1. A literal string is assigned to the variable $username. The data is chris, and its context is a PHP variable.
  2. The data within $username is interpolated within another string, and this string is assigned to the variable $filename. The data is http://host/profile.php?user=chris, and its context is a PHP variable.
  3. This is the most important line of code in the example, primarily because it changes the context in which the data within $username and $filename is used. Because file_get_contents() can be used to fetch content from remote sources (if allow_url_fopen is enabled, which it is by default), this line of code initiates an HTTP request. Whereas, before, the data in $filename was just a string, now it’s actually being used as a URL. More importantly, the data in $username is now the value of a query string parameter within this URL. The context has changed.

Because chris is safe to be used as a query string parameter without interfering with the format of the URL, this works as expected. However, it is always best to escape output, and this is no exception. I prefer to use arrays to help me keep up with data that has been prepared for a specific context:

  1. <?php
  2.  
  3. $url = array();
  4.  
  5. $username = 'chris';
  6. $url['username'] = urlencode($username);
  7.  
  8. $filename =
  9.   "http://host/profile.php?user={$url['username']}";
  10.  
  11. $contents = file_get_contents($filename);
  12.  
  13. ?>

Although using urlencode() on chris is unnecessary (it does nothing), this example illustrates a best practice. It’s not always possible to solve output problems by restricting the format of your data, and urlencode() is an escaping function created specifically for this purpose: preserving data in the context of a URL.

Here’s another example:

  1. <?php
  2.  
  3. $target = 'http://host/profile.php?user=chris';
  4.  
  5. $query_string = "?target={$target}";
  6.  
  7. $filename =
  8.   "http://host/login.php{$query_string}";
  9.  
  10. $contents = file_get_contents($filename);
  11.  
  12. ?>

What should be escaped with urlencode()?

The first answer is correct, because $target is being used as the value of a query string parameter:

  1. <?php
  2.  
  3. $url = array();
  4.  
  5. $target = 'http://host/profile.php?user=chris';
  6.  
  7. $url['target'] = urlencode($target);
  8.  
  9. $query_string = "?target={$url['target']}";
  10.  
  11. $filename =
  12.   "http://host/login.php{$query_string}";
  13.  
  14. $contents = file_get_contents($filename);
  15.  
  16. ?>

The data within $filename is the following:

  1. http://host/login.php?target=http%3A%2F%2Fhost%2Fprofile.php%3Fuser%3Dchris

Therefore, the value of $_GET['target'] in the login.php script is the following:

  1. http://host/profile.php?user=chris

In other words, the original value has been preserved, despite the fact that the data has existed in multiple contexts.

Another important point is that the escaping needs to be performed on individual parameter values; it should not modify the overall format of the URL.

HTML

Most PHP apps generate output that is rendered by a browser, and this is usually of type text/html, typically identified in a Content-Type HTTP entity header:

  1. Content-Type: text/html

Preferably, the character set is also indicated:

  1. Content-Type: text/html; charset=UTF-8

HTML has its own format, and the data we generate in PHP can affect that format. For example:

  1. <?php
  2.  
  3. $first = 'Chris';
  4. $last = 'Shiflett';
  5.  
  6. $name = "{$first} {$last}";
  7.  
  8. echo "$name";
  9.  
  10. ?>

In this simple example, the name is going to be displayed in bold and within its own paragraph. This markup is intentional. The data within $first and $last is intended to be raw data, however, free of markup or anything that might be interpreted by the browser. Therefore, it’s best to escape it:

  1. <?php
  2.  
  3. $html = array();
  4.  
  5. $first = 'Chris';
  6. $last = 'Shiflett';
  7.  
  8. $html['first'] = htmlentities($first, ENT_QUOTES, 'UTF-8');
  9. $html['last'] = htmlentities($last, ENT_QUOTES, 'UTF-8');
  10.  
  11. $name = "{$html['first']} {$html['last']}";
  12.  
  13. echo "$name";
  14.  
  15. ?>

Although the escaping is unnecessary in this case, skipping this step can easily create cross-site scripting (XSS) vulnerabilities:

  1. <?php
  2.  
  3. $first = $_POST['first'];
  4. $last = $_POST['last'];
  5.  
  6. $name = "{$first} {$last}";
  7.  
  8. echo "$name";
  9.  
  10. ?>

Although this example also illustrates a failure to filter input, escaping alone prevents the cross-site scripting (XSS) vulnerability:

  1. <?php
  2.  
  3. $html = array();
  4.  
  5. $first = $_POST['first'];
  6. $last = $_POST['last'];
  7.  
  8. $html['first'] = htmlentities($first, ENT_QUOTES, 'UTF-8');
  9. $html['last'] = htmlentities($last, ENT_QUOTES, 'UTF-8');
  10.  
  11. $name = "${html['first']} ${html['last']}";
  12.  
  13. echo "$name";
  14.  
  15. ?>

Without ensuring that the data in $first and $last is only considered to be raw data, an attacker can take advantage of any client-side technology by providing data in $_POST['first'] and/or $_POST['last'] that takes advantage of this context.

Sometimes, even in the context of HTML, there are subtle differences:

  1. value="" />

This also illustrates a cross-site scripting (XSS) vulnerability, but an exploit must be slightly different, because the context of the data within $_POST['user'] is now the attribute of an HTML tag. For example, to generate a simple popup window in JavaScript, the following value can be provided:

  1. ">alert('XSS');<"

Luckily, these subtle differences don’t affect PHP developers much, because htmlentities() accounts for all characters that can alter the context of data within HTML.

SQL

When PHP developers send data to a database, it is often through the use of an SQL query:

  1. $sql = 'SELECT * FROM users';

There are many databases with which PHP can communicate, and each client library has its own function that executes SQL queries. For example, MySQL provides mysql_query():

  1. $result = mysql_query($sql);

Databases, much like PHP variables, are designed to store data, so this context is safe for any data (within practical limitations such as the amount of memory or disk space available). However, this is not the case with the SQL query:

  1. $sql = "SELECT *
  2.         FROM users
  3.         WHERE username = '{$_POST['username']}'
  4.         AND password = '{$_POST['password']}'";

This example creates an SQL injection vulnerability (and suggests that passwords are stored improperly, but that’s irrelevant to the present topic), because an attacker can provide data in $_POST['username'] and/or $_POST['password'] that modifies the format of the SQL query. For example, my favorite username for testing authentication forms is the following:

  1. chris' --

Because two hyphens (--) indicate the start of a comment, this effectively reduces the SQL query to the following:

  1. SELECT *
  2. FROM users
  3. WHERE username = 'chris'

If this is being used to verify access credentials, I can gain access to the chris account without knowing the password. Of course, another important thing to note is that these assumptions are only true if the data in $sql is executed by a database. Until a function like mysql_query() is used, this data is just a string in a PHP variable.

Luckily, there are escaping functions that preserve data in the context of an SQL query. However, because of subtle differences in the ways various databases interpret and execute SQL, it is best to use a database-specific escaping function. For example:

  1. <?php
  2.  
  3. $mysql = array();
  4.  
  5. $mysql['username'] =
  6.   mysql_real_escape_string($_POST['username']);
  7.  
  8. $mysql['password'] =
  9.   mysql_real_escape_string($_POST['password']);
  10.  
  11. $sql = "SELECT *
  12.         FROM users
  13.         WHERE username = '{$mysql['username']}'
  14.         AND password = '{$mysql['password']}'";
  15.  
  16. ?>

Because mysql_real_escape_string() considers the character encoding of the current connection to MySQL, a connection to the database must exist.

URLs

URLs adhere to a strict format, and the context of data depends upon where in the URL it is used. For example:

  1. $host = $_POST['host'];
  2. header("Location: http://{$host}/");

The context of the data in $host is intended to be the hostname of a URL used in a Location header. There isn’t actually an escaping function to help you ensure that the data within $host is only considered to be a hostname. In this case, it is actually best to ensure that the hostname adheres to the proper format, or, better, that it is one of a known set of valid values:

  1. <?php
  2.  
  3. $clean = array();
  4.  
  5. switch ($_POST['host']) {
  6.   case 'shiflett.org':
  7.   case 'faculty.co':
  8.     $clean['host'] = $_POST['host'];
  9.   default:
  10.     /* Error */
  11. }
  12.  
  13. header("Location: http://{$clean['host']}/");
  14.  
  15. ?>

Without this filtering, the example illustrates an HTTP response splitting vulnerability.

In a URL, there is typically only one context in which the data is dynamic — the values of query string parameters. As demonstrated earlier, this data can be escaped in order to preserve it by simply using urlencode():

  1. <?php
  2.  
  3. $url = array();
  4.  
  5. $url['zip'] = urlencode($_POST['zip']);
  6.  
  7. header("Location: http://host/weather.php?zip={$url['zip']}");
  8.  
  9. ?>

Of course, understanding context doesn’t eliminate the need to filter input, and this is a perfect example. If the ZIP is expected to be five digits, a simple check will ensure that it is. In general, filtering ensures data integrity while escaping ensures data preservation.

Nested Contexts

One of the most frequently asked questions on PHP mailing lists and forums is how to properly escape a link. For example:

  1. <a href="<?php echo $link; ?>">Click Here</a>

This becomes slightly more complicated when $link consists of other data:

  1. $link = "http://http://host/weather.php?zip={$zip}";

What needs to be escaped, and how?

The correct answers are the last then the first, because the context of the data in $zip is the value of a query string parameter, and the context of the data in $link is HTML.

For example:

  1. <?php
  2.  
  3. $html = array();
  4.  
  5. $url = array();
  6.  
  7. $url['zip'] = urlencode('zip');
  8.  
  9. $link = "http://http://host/weather.php?zip={$url['zip']}";
  10.  
  11. $html['link'] = htmlentities($link, ENT_QUOTES, 'UTF-8');
  12.  
  13. ?>
  14.  
  15. <a href="<?php echo $html['link']; ?>">Click Here</a>

This is why the HTML entity of an ampersand (&amp;) is used to separate query string parameters in URLs when the URLs exist within HTML.

Until Next Time…

I hope this article helps you better understand and appreciate context. As stated earlier, many web application vulnerabilities can be traced to a developer’s failure to properly account for context.

Although escaping is emphasized more than filtering in this article, it should never be considered a substitute. (Some examples omit filtering in order to focus on context.) However, many common web application vulnerabilities such as cross-site scripting (XSS) and SQL injection are escaping problems, not filtering problems, and it’s always best to address the root cause of a problem rather than a symptom.

Until next month, be safe.