Theory

Published in PHP Architect on 18 Jul 2005

I was in Vancouver recently to give a talk at PHP West called the PHP Security Audit HOWTO. The positive response has been overwhelming, which is unusual — I typically receive very little feedback from any of my talks (or articles, for that matter).

While trying to determine the reason for the increased response, I have decided that it is mostly a result of the talk being more pragmatic than my others. Rather than giving sound theoretical advice, the talk mostly consists of me explaining specific strings that you can search for in your PHP code in order to focus on common points of failure and quickly locate potential security weaknesses. Many slides simply list a collection of related strings to search for, and I discuss the common mistakes that I have observed the most.

As a result of this observation, I have decided to write a brief explanation about why theory is important to security. I am not trying to convince you to be less pragmatic. On the contrary, I want you to embrace your pragmatism while still adhering to some theoretical practices that have been known to have practical effects. Being stubborn is not pragmatic.

Defense in Depth

One of my favorite security principles is defense in depth. This principle asserts that redundant safeguards have value. Stated differently, you can never be too safe.

The idea is pretty simple, but let me give you a practical example. If I am building an app that lets users register, I might have a form that accepts a username, password, and email address. If I only allow alphanumeric usernames, I can enforce this in my input filtering:

  1. <?php
  2.  
  3. $clean = array();
  4.  
  5. if (ctype_alnum($_POST['username'])) {
  6.   $clean[‘username’] = $_POST['username'];
  7. } else {
  8.   /* Error */
  9. }
  10.  
  11. ?>

Once assured that I have a valid username, I might store it in the user’s session:

  1. $_SESSION['username'] = $clean['username'];

Knowing that session data is stored on the server, I know that it’s somewhat trustworthy, at least compared to data coming from the client or some other external source. Therefore, I might choose to greet the user on each page:

  1. echo "<p>Hello, {$_SESSION['username']}!</p>";

If you are a regular reader of Security Corner, you should know that output must always be escaped:

  1. <?php
  2.  
  3. $html = array();
  4.  
  5. $html['username'] = htmlentities($_SESSION['username'], ENT_QUOTE, 'utf-8');
  6.  
  7. echo "<p>Hello, {$html['username']}!</p>";
  8.  
  9. ?>

Astute readers might already be questioning me, noting that htmlentities() is going to have no effect on an alphanumeric string. The username has no special characters that need to be escaped, so this extra work is useless. The truly pragmatic developers might even be disgusted by such an approach. The time it took to write those extra few lines of code could have been applied to solving the next problem.

These arguments and criticisms all have merit, and that is often the dilemma I am faced with when discussing the theoretical aspects of security. History has demonstrated that theoretical weaknesses often yield real vulnerabilities in time, and this is the primary reason why adhering to theoretically sound practices can save you from attacks that are either unknown or which do not yet exist.

In this particular example, there are situations that can yield a cross-site scripting vulnerability if the escaping is not performed. For example, the session data store might be compromised (a trivial task in some cases), so that $_SESSION['username'] is no longer the filtered username but rather some tainted data. If the seemingly useless escaping step is taken, it can potentially save the day.

This is the basic idea of defense in depth, and this demonstrates the principle of sound theory. Complex systems can behave in unknown ways, and there is a great deal of value in checking this behavior and enforcing certain constraints, even when your effort seems redundant and wasteful.

Never Correct Invalid Data

There is another principle of security that points out the dangers of modifying invalid data in an attempt to make it valid. (It is not as well known as defense in depth, most likely because there is no standard name for it.) This goes against the natural instinct of many PHP developers, and it requires a strong commitment to theory in order to fully appreciate.

The idea is simple — input filtering is an inspection process, not a modification process. I see developers using many techniques that conflict with this principle. For example, you might want to be sure that $_GET['id'] is an integer:

  1. <?php
  2.  
  3. $clean = array();
  4.  
  5. $clean['id'] = intval($_GET['id']);
  6.  
  7. ?>

Is this safe? The answer isn’t so straightforward. In this particular example, there is no vulnerability introduced. However, it does hide malicious attempts, so you aren’t kept aware of the fact that your application is being attacked. You could correct this by first inspecting the data and logging all data that fails your inspection, but then you might as well use that as your input filtering. (Doing both is a good defense in depth strategy, but only because the intval() function should have no affect.)

A better example of trying to correct invalid data is when developers manually attempt to eliminate file traversal vulnerabilities:

  1. <?php
  2.  
  3. $clean = array();
  4. $clean['filename'] = str_replace('..', '.', $_POST['filename']);
  5.  
  6. ?>

The idea here is simple. For every reference to the parent directory, it is replaced with a reference to the current directory. However, imagine that $_POST['filename'] contains the following:

  1. .../.../.../.../.../etc/passwd

This would be easy to identify as invalid input. However, because an attempt is made to correct the invalid data, $clean['filename'] becomes the following:

  1. ../../../../../etc/passwd

This particular vulnerability could have been resolved by performing the same string replacement until a reference to the parent directory is no longer found (using a while loop, for example), but the point is that it’s dangerous to try to correct invalid data. It’s always safer to inspect the data and ensure that it abides by your rules. If it does not, then it’s better to force the user (or other external system) to supply you with valid data rather than try to correct the invalid data. Doing otherwise heightens your risk, and a mistake can be disastrous.

Frequent Debates

There are other debates that I observe quite frequently within the community, and it’s almost always a result of someone being stubborn. For example, a common debate is whether htmlentities() is really any safer than just replacing angled brackets with their HTML entities:

  1. <?php
  2.  
  3. $html = array();
  4.  
  5. $tmp = '';
  6. $tmp = str_replace('<', '&lt;', $_POST['username']);
  7. $tmp = str_replace('>', '&gt;', $tmp);
  8.  
  9. $html[‘username’] = $tmp;
  10.  
  11. echo "<p>Hello, {$html['username']}!</p>";
  12.  
  13. ?>

This particular example has another problem: it fails to filter input ($_POST['username']), but the escaping is also sub-par. This is difficult to prove, because examples that exploit this approach are complex. When I want to explain to someone why they should not echo the raw $_POST['username'], I can supply a simple example that they can understand:

  1. "><script>alert('XSS')</script><"

This risk is much easier to convey, because the exploit is simple. When the exploit becomes complex, proving the merit of certain practices becomes difficult. For example, the exploit might require the use of a NUL byte or a different character encoding. In these cases, the vulnerability has to be explained in principle, and I often cite history as support.

For example, there have been several cross-site scripting vulnerabilities in the past that do not rely on angled brackets. Browsers have been known to interpret other characters the same as angled brackets (usually characters that look similar). This is because browsers (particularly Internet Explorer) try to interpret invalid data correctly in an attempt to appear very accommodating and reliable. Similar examples include typographically-correct quotes. A literal search for quotes will not find them, but a system that tries to interpret them as ordinary quotes might be at risk.

This also highlights why a whitelist approach to filtering is safest — it’s very difficult to determine which characters are potentially malicious. It is much easier (and therefore more reliable) to determine which characters are definitely valid and to consider everything else to be invalid.

If you consider character encoding, the strength of mysql_real_escape_string() versus addslashes() becomes clearer. I have a blog post with more details on this topic, complete with an example exploit.

Because the escaping performed by mysql_real_escape_string() takes the character encoding into account, you can be assured that what the database considers to be a single quote is what your escaping function considers to be a single quote. If your escaping function and database are inconsistent in this regard, a security vulnerability exists. It is certainly difficult to exploit (which means it will be missed by primitive SQL injection tests), but it is a vulnerability that could have been avoided by simply using the correct function.

Until Next Time…

I hope that you can better appreciate the theory of security and the practices that I teach each month here in Security Corner. Web app security is a highly-specialized discipline, and without a devotion to it, you’re unlikely to always understand why some practices are preferred over others. I do my best each month to not only explain how to protect yourself from a particular type of attack but also why you should. This is because I understand the difficulty in blindly trusting someone, and I also realize that trust is something that has to be earned.

You should now realize that a practice that is theoretically more secure today might be the only secure practice tomorrow. Theory is what protects us from the unknown, and with a world full of creative attackers, there is much that is unknown.

Until next month, be safe.