More on Filtering Input and Escaping Output

08 Feb 2005

In my previous blog entry, I summarized the two most important steps (in my opinion) that all PHP developers should take to help secure their applications:

These are essentially "the least you can do" in terms of security. I consider anything less to be negligent (we all make mistakes, but these mistakes should be the exception and not the norm).

To my surprise, this simple statement has already been misinterpreted, and this is what prompted me to try to clarify things. Robert Peake writes:

Chris Shiflett has an interesting post on his blog wherein he declares that all PHP security vulnerabilities come from either a lack of flitering input or escaping output.

I hope that's not what I said, especially since it is wrong. :-) Filtering input and escaping output certainly aren't going to protect you from everything, but these two steps can improve the security of your applications substantially with very little effort.

Of course, my simple list leaves out many details, and that's fine. As I mentioned before, this list provides a broad perspective that helps to keep you on track while you focus on the details. I'm trying to help you focus on what's most important, because it's not always practical to implement every safeguard that you know.

The challenge is identifying data that comes from some external source - what is input? Robert mentions something else that I want to correct:

What this really points out once again is that web applications written in PHP do not really need to focus on much more than absolutely everything that a malicious attacker could throw at you through GET, POST or COOKIES (unless they have access to your server ENVIRONMENT ... *shudder*). Once again this means that if register_globals is turned off, these variables can only make their way in neatly packaged into corresponding $_GET, $_POST, and $_COOKIE arrays (as well as $_SESSION).

It is true that all data in $_GET, $_POST, and $_COOKIE is sent from the client and therefore tainted. However, data within $_SESSION is not. This data is persisted on the server and never even exposed over the Internet (unless you have a custom session handler that specifically does this). If you filter data on input, then you will never store tainted data in a session variable. Therefore, you can trust $_SESSION.

$_SERVER contains a mixture. Some of this data is provided by the web server, and some is provided by the client. Try this simple quiz.

Where does the data in each of the following PHP variables originate?

  1. 1. $_SERVER['DOCUMENT_ROOT']
  2. 2. $_SERVER['HTTP_HOST']
  3. 3. $_SERVER['REQUEST_URI']
  4. 4. $_SERVER['SCRIPT_NAME']