Zend Framework Update

11 Nov 2005

A few weeks ago, I posted my Zend Framework Wishlist. Most of the things I mentioned were off the top of my head, but I think it got people (including me) thinking about how we can make some security problems easier to solve. It also attracted the attention of Open Enterprise Trends, who interviewed me and published a story about the framework. Although it's a bit hard to tell from reading the story, I was interviewed prior to being involved.

I just signed the CLA (Contributor License Agreement) today, so now Brain Bulb is officially part of the project. I'm going to be putting a lot more thought into the topics I brought up in my wishlist, and I hope to make some positive contributions to the project and to PHP.

As you can tell from the listing in Wez's blog, there is an input filtering class called ZInputFilter. Although this is quite nice, trying to address some problems at the input stage is clumsy at best. (If you need an example, look no further than magic_quotes_gpc.) Most problems such as cross-site scripting (XSS) and SQL injection are output problems - once data enters a new context, everything changes (obvious, right?).

Security expert Nitesh Dhanjani correctly notes that a lack of output escaping causes XSS vulnerabilities. (He also notes that the most common mistake it to consider it an input filtering problem.) In the comments that follow his post, examples are provided to illustrate how various languages and platforms handle this particular problem. None of them are too bad, but I'm just not impressed. There must be a better solution.

When developing PHP applications, most of our work involves dealing with strings. When preparing content to be displayed in a browser, we sometimes add markup to those strings:

  1.  
  2. $first_name = 'Chris';
  3. $last_name = 'Shiflett';
  4. $city = 'New York';
  5. $state = 'NY';
  6.  
  7. $name = "<b>$first_name $last_name</b>";
  8. $location = "<i>$city, $state</i>";
  9.  
  10. echo "<p>My name is $name, and I live in $location.</p>";
  11.  

This simple example demonstrates how easy it is to mix dynamic and static data, and the following figure illustrates this further:

The problem is how to make it easy for a developer to guarantee that $first_name, $last_name, $city, and $state are going to be treated as raw data. They need to be escaped with htmlentities() (or htmlspecialchars()), but $name and $location are clearly meant to contain markup, so escaping them would not be desirable. In other words, we often create strings in PHP that contain disparate types of data, and problems can arise when it is not properly handled. Currently, a developer's only defense is knowledge and discipline, but I think we can do better.

When sending data to a database, this problem has already been solved. Bound parameters guarantee that raw data never enters a context where it can be considered anything but data:

  1.  
  2. $db = new mysqli('localhost', 'user', 'pass', 'database');
  3. $query = $db->prepare('SELECT *
  4.                        FROM   users
  5.                        WHERE  username = ?
  6.                        AND    password = ?');
  7. $query->bind_param('ss', $username, $password);
  8. $query->execute();
  9.  

(This isn't meant to teach you how to use bound parameters. For more information, see Zak Greant and Georg Richter's article on ext/mysqli or the "Prepared Statements" section of Wez Furlong's article on PDO.)

This example demonstrates how a developer can indicate which parts of a string are meant to be raw data ($username and $password in this case) and which parts are meant to be interpreted (the SQL statement). It is this separation that is key.

I think this solution works well, because interacting with a database isn't much easier without the use of bound parameters. Sending data to the client is as easy as using echo, so any solution to the XSS problem can't be too cumbersome. However, the developer must indicate the distinction between these disparate types of data, so some overhead in terms of syntax is unavoidable.

I'll post more once I've had a few beers. :-) I welcome your comments and suggestions.