Ideology

Published in PHP Architect on 15 Nov 2004

Welcome to another edition of Security Corner. This month's topic is ideology, the theory and practices behind secure programming. While studying specific attacks is necessary for you to understand why to employ some practices, adhering to a strict ideology is what can protect you against unknown attacks.

Some of the things I'll be discussing include data flow and naming conventions. Whether you are a developer yourself or manage a group of developers, the most important thing you can be doing is keeping up with data - where and how it enters and exits your application.

Data

A security-conscious developer thinks of data in two groups:

Filtered data is data that you have filtered or that you create yourself. For example, anything hard-coded can be safely considered filtered:

  1. <?php
  2.  
  3. $magazine = 'php|architect';
  4.  
  5. ?>

The important part is php|architect - this is the data. The variable, $magazine in this case, is just a container for the data, so don't focus on this. In fact, by their very nature, variables can contain any data at any given time. It's up to the developer to keep up with what is in a particular variable at a particular time. This can become a very complex task, and this is why many people use debuggers (or a liberal use of echo) during development.

In some cases, you want a particular variable to be assigned some data and never overwritten. This may be the case for configuration information, for example. In these cases, don't use a variable at all - use a constant:

  1. <?php
  2.  
  3. define('MAGAZINE', 'php|architect');
  4.  
  5. ?>

The nice thing about constants, as their name suggests, is that they don't change. Once you assign a constant a value, you can be assured that it will have that value for the entire life of the script. For example, the following example outputs php|architect:

  1. <?php
  2.  
  3. define('MAGAZINE', 'php|architect');
  4. define('MAGAZINE', 'Wired');
  5. echo MAGAZINE;
  6.  
  7. ?>

This takes us back to the topic of ideology. Let's assume that you never want to refer to another magazine, but you prefer variables - maybe constants look funny to you. As long as your code never reassigns the variable, you're safe, right?

It's true - variables don't magically change value unless you reassign them (thankfully - otherwise programming would be a nightmare). However, using a constant has a couple of notable advantages, particularly when it comes to security:

Of course, there is also the combination of these things - it is clear to other developers that this particular data container is never meant to change. How else can you communicate this? You can always write a comment, or use a naming convention like $dontchange_magazine, but these methods are not nearly as certain or elegant as using a constant.

Data Flow

If you're primarily concerned with security, it's unlikely that you'll be focusing on hard-coded data, because input (remote data) is where the risks lie. There are many types of input, and some examples include:

The user can send data in three primary ways:

HTTP headers can be considered a fourth way, and in fact this is how cookies are sent, but only the data that you use in your programming logic is a concern - unused data does not pose a serious threat. However, it is important to realize that everything within an HTTP request is input.

PHP helps you keep up with where data originates, particularly data sent by the user. $_GET, $_POST, and $_COOKIE are all superglobal arrays that are very easy to recognize.

These should not be reassigned. For example, you might be tempted to do the following:

  1. <?php
  2.  
  3. if (!valid_magazine($_POST['magazine'])) {
  4.     $_POST['magazine'] = '';
  5. }
  6.  
  7. ?>

Assuming valid_magazine() is a function that correctly determines whether a particular magazine name is valid, this approach assures us that $_POST['magazine'] can be trusted from this point forth. This is a very dangerous habit, because one of the hallmarks of a good PHP developer is a natural inclination to be suspicious of any data stored within $_POST (and $_GET, $_COOKIE, etc.). Whether you are the developer yourself or you manage a group of developers, this is a habit you want to foster, not erode.

Naming Conventions

If you shouldn't put validated data back into its original variable (in the case of $_GET, $_POST, and $_COOKIE), where should you put it? I've partially answered this question here in Security Corner before, but it's worth mentioning again. You want filtered data to be easy to recognize, and this is where a strict naming convention can be extremely valuable.

The naming convention you follow doesn't matter, as long as it has a few notable characteristics:

This last point is more a characteristic of a secure design, a topic discussed in a previous edition. However, the other two characteristics depend entirely on the naming convention that you use. The convention I often use, and the one I've mentioned here before, is best illustrated by example:

  1. <?php
  2.  
  3. $clean = array();
  4.  
  5. if (ctype_alnum($_POST['username'])) {
  6.     $clean['username'] = $_POST['username'];
  7. }
  8.  
  9. ?>

The variable name I use is $clean, which is a single array that contains all filtered (clean) data. By always initializing this variable, I can be assured that it only contains data that is specifically assigned to it in my programming logic (even if register_globals is enabled). All that is required after the initialization is that I only put data within it once that data has been filtered. In this example, $clean['username'] is guaranteed to always be a string that consists of only alphabetic and numeric characters. Even if I get lost in the complexity of my application, the worst thing that can happen is that I reference $clean['username'] before it is set. If you develop with error_reporting set to E_ALL, you can often catch these errors.

$_REQUEST

I want to add a quick note about $_REQUEST before closing.

Don't use it.

$_REQUEST hides the source of data, much like register_globals. While it is slightly better in the sense that it is clearly input (whereas register_globals can make it difficult to distinguish between remote and local data), the difference between GET and POST data is significant. Consider the following excerpt from RFC 2616, the HTTP specification:

In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

If you can't distinguish between GET and POST, you cannot adhere to the specification. While this may seem like a harmless violation, there are also security concerns - particularly cross-site request forgeries. This is an attack that I've written about in php|architect before but never specifically in Security Corner. It is a likely topic for a future edition.

Until Next Time...

While discussing ideology isn't as exciting as examining specific attacks, it is equally as important. I hope that I've been able to strengthen your theoretical foundation, and this is a topic that I'm likely to revisit from time to time.

By adhering to theoretically-pure ideologies and methods, you can protect yourself from the unknown and avoid having to "upgrade" your approach when new attacks are discovered. If you have some ideologies of your own that you wouldn't mind sharing, please feel free to let me know. Until next month, be safe.