With any luck, Geoff and I will be giving a PHP testing tutorial at this year's ApacheCon. Here's a snippet of the abstract:
Admit it - deep down inside, you know you should be testing your PHP applications. With all of the different PHP test environments and the daunting documentation, sometimes it is difficult to know where to start. This tutorial will help. The first step in testing is deciding what to test, so we will begin by offering a very simple (but not contrived) PHP application with identifying elements that lend themselves to testing - both unit tests and functional tests. Next, we will write some real tests using several of the existing PHP testing frameworks, including PHPUnit, Simple Test, phpt, and Apache-Test.
Unfortunately, testing hasn't really caught on in the PHP community for some reason, despite the existence of several useful tools and resources:
We need a few more people to register in order to have the opportunity to give this tutorial, so please sign up soon. If you do so before November 20, you get a $100 discount.
A few weeks ago, I posted my Zend Framework Wishlist. Most of the things I mentioned were off the top of my head, but I think it got people (including me) thinking about how we can make some security problems easier to solve. It also attracted the attention of Open Enterprise Trends, who interviewed me and published a story about the framework. Although it's a bit hard to tell from reading the story, I was interviewed prior to being involved.
I just signed the CLA (Contributor License Agreement) today, so now Brain Bulb is officially part of the project. I'm going to be putting a lot more thought into the topics I brought up in my wishlist, and I hope to make some positive contributions to the project and to PHP.
As you can tell from the listing in Wez's blog, there is an input filtering class called ZInputFilter. Although this is quite nice, trying to address some problems at the input stage is clumsy at best. (If you need an example, look no further than magic_quotes_gpc.) Most problems such as cross-site scripting (XSS) and SQL injection are output problems - once data enters a new context, everything changes (obvious, right?).
Security expert Nitesh Dhanjani correctly notes that a lack of output escaping causes XSS vulnerabilities. (He also notes that the most common mistake it to consider it an input filtering problem.) In the comments that follow his post, examples are provided to illustrate how various languages and platforms handle this particular problem. None of them are too bad, but I'm just not impressed. There must be a better solution.
When developing PHP applications, most of our work involves dealing with strings. When preparing content to be displayed in a browser, we sometimes add markup to those strings:
$first_name = 'Chris';
$last_name = 'Shiflett';
$city = 'New York';
$state = 'NY';
$name = "<b>$first_name $last_name</b>";
$location = "<i>$city, $state</i>";
echo "<p>My name is $name, and I live in $location.</p>";
This simple example demonstrates how easy it is to mix dynamic and static data, and the following figure illustrates this further:
The problem is how to make it easy for a developer to guarantee that $first_name, $last_name, $city, and $state are going to be treated as raw data. They need to be escaped with htmlentities() (or htmlspecialchars()), but $name and $location are clearly meant to contain markup, so escaping them would not be desirable. In other words, we often create strings in PHP that contain disparate types of data, and problems can arise when it is not properly handled. Currently, a developer's only defense is knowledge and discipline, but I think we can do better.
When sending data to a database, this problem has already been solved. Bound parameters guarantee that raw data never enters a context where it can be considered anything but data:
$db = new mysqli('localhost', 'user', 'pass', 'database');
$query = $db->prepare('SELECT *
WHERE username = ?
AND password = ?');
$query->bind_param('ss', $username, $password);
(This isn't meant to teach you how to use bound parameters. For more information, see Zak Greant and Georg Richter's article on ext/mysqli or the "Prepared Statements" section of Wez Furlong's article on PDO.)
This example demonstrates how a developer can indicate which parts of a string are meant to be raw data ($username and $password in this case) and which parts are meant to be interpreted (the SQL statement). It is this separation that is key.
I think this solution works well, because interacting with a database isn't much easier without the use of bound parameters. Sending data to the client is as easy as using echo, so any solution to the XSS problem can't be too cumbersome. However, the developer must indicate the distinction between these disparate types of data, so some overhead in terms of syntax is unavoidable.
I'll post more once I've had a few beers. :-) I welcome your comments and suggestions.
There has been much discussion recently about Sony's rootkit that is bundled with some corrupted CDs. The EFF lists some of the corrupted CDs, and David Sklar suggests building a corrupt CD tracker (using Ning). There is already at least one exploit that takes advantage of Sony's new software.
It's nice to see that Computer Associates has stepped up and called this what it is - a trojan. (They also correctly label Sony's "Music Player" as spyware.)
I just noticed that another prominent member of the PHP community has started a blog. Richard Davey has been answering questions on various PHP mailing lists and forums for years, and now he has his own blog. Are you subscribed?
Note: The list of blogs I read is much more complete than the feed. Anyone know if this is a known del.icio.us bug?
One of his first entries is a response to a rather poor list of PHP tips. Richard counters many of the supposed performance tips that focus on trivial details such as single quotes versus double quotes. He also mentions the importance of readability. I think this is a point that is often lost, especially for developers who seek to impress others with the complexity of their code. Personally, I feel successful when my solution to a problem is unimpressive and painfully simple.
One thing that surprises me is that Richard seems to agree that concatenation is more readable than interpolation. Personally, I prefer interpolation:
$name = 'Chris';
$location = 'New York';
echo "My name is $name, and I live in $location.";
I think concatenation is less clear:
$name = 'Chris';
$location = 'New York';
echo 'My name is ' . $name . ', and I live in ' . $location . '.';
You could argue that concatenation is clearer to PHP, but I try to focus on what's clearer to the developer. Which do you prefer?
Update: Richard says he usually prefers interpolation. I guess we agree on all points. :-)