Code Audits, by Chris Shiflett

Code Audits

Published in PHP Architect on 21 Sep 2005

Before you get started with any task, it’s always best to know what exactly you’re doing. Before you can effectively audit a PHP app, you need to clearly define your task and its associated goals. What are you doing, and why are you doing it? What is a PHP security audit?

An audit is an examination, so a PHP security audit is primarily an examination of a PHP app’s source code. In other words, it’s a code review with a narrow focus: security. There are a few abstract points of interest, including the software design and the PHP configuration.

In order for an audit to be as valuable as possible, it is important that nothing be off-limits. The idea that “a chain is only as strong as its weakest link” certainly applies, so you want to have access to everything, otherwise you might not find the “weakest link” in the application.

Setting the Bar

There are a few steps to take before you actually start examining the source code. One of the first steps is to determine how much security is required. In other words, you need to set the bar.

I recommend starting with a minimum goal. It’s very difficult to gauge the effort required to audit an application, and you should focus on the most important things first. Depending on your employment situation, you might have new responsibilities materializing as you go, new assignments to steal your focus, or you might simply run out of time. Rather than risk focusing on the details and missing something major and obvious, it’s best to look at the big picture first. You can always dive into the details later.

The bar that I use is that every PHP app should at least filter input and escape output (FIEO). This involves more than just performing these steps — it also suggests that the app should make sure that tainted data cannot possibly be mistaken for filtered data, the filtering process cannot be avoided by a clever attacker, and the like.

For example, while you’re examining the source code, you might encounter code like the following:

<a href="index.php?action=<?php echo $action; ?>">
<?php echo $desc; ?>
</a>

It’s hard to tell from this one line whether $action and $desc have been properly filtered and escaped. This is a security risk.

Filtering is unnecessary if $action is set in the code and does not come from a remote source. It’s still better to adhere to a strict naming convention, otherwise it's difficult to tell whether $action is filtered or tainted:

<?php
$clean = array();
/* … */
$clean['action'] = 'register';
?>

Remember, you must filter input, but data that is not input does not have to be filtered. If this data is used as output, however, it still needs to be escaped. Because $action is being used as the value of an argument in the query string, the escaping that is required is URL encoding:

<?php
$url = array();
/* … */
$url['action'] = urlencode($clean['action']);
/* $url['action'] is escaped. */
?>

This illustrates a point of confusion for many developers, especially those new to security concerns. When identifying output, anything sent to the client is output, even URLs or form data that is ultimately sent back to the server. In this case, $action is being sent to the client. Do not confuse this with $_GET['action'], the variable you reference when the user clicks on this link.

Because $desc is sent to the client, it must also be escaped. In this case, the proper escaping is htmlentities(). Here is an example that assumes $clean['desc'] is the filtered description:

<?php
$html = array();
/* … */
$html['desc'] = htmlentities($clean['desc'], ENT_QUOTES, 'UTF-8');
/* $html['desc'] is escaped. */
?>

Of course, when you’re just auditing code, your task is to make sure that these required steps have been taken. Identifying failures is enough, although providing an exploit can often help to clarify a vulnerability.

Analyzing the Design

Another step to take before you take a detailed look at the code is to analyze the design. I always begin this process by having the design explained to me, preferably by the developers. No one knows an application as well as the developers, and implementation often strays slightly from the documentation that might be available, so the developers are the only reliable source of information in this regard.

A poor or unnecessarily complex design is a security risk. It can be the most impressive design you have ever seen, but if the developers can’t properly explain it due to its complexity, then it represents a possible security hole. Complexity breeds mistakes, and mistakes frequently yield security vulnerabilities.

Another indication of a design problem is when tracking data is difficult. Can you easily track data from the point where it enters the system to the point where it exits, including transformations? If not, then it's likely that the developers can’t either, and this is a security risk.

As noted in the previous section, it’s also important that distinguishing between tainted and filtered data is made easy. If this is difficult, then developers are more likely to mistake tainted data for filtered data, and will almost certainly write vulnerable code.

Lastly, security must be part of the design. A design with no mechanisms to help promote security is the biggest mistake you can identify. Security-conscious developers cannot compensate for a lack of security in the design, and many PHP applications suffer from this. Without a secure design, developers are destined to be perpetually patching security vulnerabilities.

Analyzing the Configuration

The last step to take before examining the source code is to analyze the configuration. PHP’s configuration is mostly dictated by php.ini, but don’t forget that it can also be modified by things like httpd.conf, .htaccess files, and ini_set().

Things to avoid include:

register_globals = On
allow_url_fopen = On
display_errors = On
magic_quotes_gpc = On

In general, if the security of the application depends upon the configuration, this is a risk that needs to be mitigated.

Examining the Source

Now you’re ready to actually start examining the source, but where do you start? This is where it is important to have already set the bar. To check whether an application adheres to FIEO, there are two steps to take:

Identify input, and trace it forward.
Identify output, and trace it backward.

These steps are a bit redundant, but they can provide you with two different perspectives, and taking both steps can help eliminate failures. If you’re auditing an application, you’re being trusted to identify all major vulnerabilities. Redundancy is good.

There are several ways to identify input, and I usually use grep or some custom utilities to help me search. HTML forms are the primary way that an application receives input from the user, and there are several strings you can search for to help you find them:

form
input
radio
select
checkbox
$_GET
$_POST
$_REQUEST

Databases are probably the second most common source of input, and SELECT statements are worth manually inspecting. Remember that SQL is case insensitive. Discovering a developer’s habits can help, because developers tend to be consistent. However, you can’t guarantee this, so it’s best to perform a case insensitive search.

HTTP headers can be accessed directly in PHP, so this is something else worth searching for. Some helpful strings to search for include:

$_COOKIE
$_SERVER

Remember, locate sources of input and trace the code forward.

In the next step, where you identify output and trace it backward, you should be able to discover the same vulnerabilities

As with input, there are several ways to identify output. The major recipient of output from a PHP application is the client, and there are several ways to send output to it. The following strings are useful searches:

echo
print
<?=

Any query sent to a database is output, even if the purpose of the query is to retrieve information. Thus, you can simply search for whatever function the developer uses to execute a query. For example, mysql_query().

Another thing that can help identify escaping problems is to search for code that unescapes data:

stripslashes()
urldecode()
html_entity_decode()

These functions should almost never be necessary, and their presence is worth inspection.

There are many other things worth searching for, and this article is meant only as a starting point. Over time, you’ll discover the methods that work best for you.

Until Next Time…

I hope you now feel more comfortable auditing PHP code. Remember that even if you don’t find every single vulnerability in a PHP application, your time can still be valuable. Peer reviews are a frequently-neglected asset of development teams, and I hope to encourage this practice within the PHP community.

Until next month, be safe.

Chris Shiflett