Input Filtering

Published in PHP Architect on 18 May 2004

Welcome to another issue of Security Corner. This month's topic is input filtering, one of the cornerstones of web application security. Input filtering is the method by which you validate all incoming data and prevent any invalid data from being used by your application. It's very similar in theory to how water filtering works, where impurities in water are not allowed to pass.

This article covers a variety of issues, but unlike previous Security Corners, I will be focusing more on the theoretical aspects than the practical. Understanding why and where to filter is more important than understanding how. By the end, you should be able to better design your applications with security in mind.

Spoofed Form Submissions

When you write code that is expecting data from the client, you're usually processing a form. It's important to appreciate just how easily a form submission can be spoofed, so that you realize that absolutely nothing about the client's request can be blindly trusted.

Consider the following form located at example.org:

  1. <form action="/receive.php" method="POST">
  2. <select name="color">
  3.     <option value="red">red</option>
  4.     <option value="green">green</option>
  5.     <option value="blue">blue</option>
  6. </select>
  7. <input type="submit" />
  8. </form>

When a user selects red and submits the form, a request similar to the following is sent:

  1. POST /receive.php HTTP/1.1
  2. Host: example.org
  3. Content-Type: application/x-www-form-urlencoded
  4. Content-Length: 9
  5.  
  6. color=red

There are two pretty common methods used to spoof such a form. One method is to recreate the HTML markup for the form, using absolute URLs instead of relative:

  1. <form action="http://example.org/receive.php" method="POST">
  2. <input type="text" name="color" />
  3. <input type="submit" />
  4. </form>

An easy way to create such a form is to save the HTML from the real site and substitute URLs where appropriate (this task can be automated to make things easier). The fake form can reside anywhere, because the request will still be sent to example.org due to the absolute URL specified by the action attribute.

Whereas the real form restricts the color to one of three choices, this new form has no restrictions and makes it convenient for an attacker to practice submitting various values for color in an attempt to subvert your application. If the attacker types red into the text field and submits the form, the request will be exactly the same as in the previous example. Of course, the most important point is that, in both cases, the request is coming from the client. Thus, you have no control over what is sent and must make sure that the color is one of the expected values.

A more direct method to spoof the form is to manually enter the POST request. If you telnet to port 80 (for standard HTTP) on the target host, you have complete flexibility. Here is an example from a standard shell prompt:

  1. $ telnet example.org 80
  2. Trying 192.0.34.166...
  3. Connected to example.com.
  4. Escape character is '^]'.
  5. POST /receive.php HTTP/1.1
  6. Host: example.org
  7. Content-Type: application/x-www-form-urlencoded
  8. Content-Length: 9
  9.  
  10. color=red
  11.  
  12. HTTP/1.1 404 Not Found
  13. ...

Because receive.php is a fictional resource, this generates a 404 response. I encourage you to try this with real forms, so you can appreciate the ease and power of this approach.

If the method of the form is GET rather than POST, the request resembles the following instead:

  1. GET /receive.php?color=red HTTP/1.1
  2. Host: example.org

As with the POST request, this can be spoofed manually or with a fake form. However, it is much easier to simply type the desired URL into a browser, so the other approaches are unnecessary. This additional convenience should not mislead you into believing that POST requests are more secure.

It should be clear that a dedicated attacker has complete control over the HTTP request that is processed by your application. In fact, it's best to not think of requests as being form submissions, since the use of a form is actually unnecessary.

Of course, the attacker could also try sending unexpected variables, but unless these are used, there is virtually no risk. This is a key point; as long as you filter the input that you use, you have a good design (implementation might be another matter). But, if you don't filter all input that you use, an attacker has an opportunity to compromise your application.

Register Globals

In PHP 4.2.0, the default setting for register_globals changed from On to Off. This change is regarded as one of the most controversial in PHP's history. There is also quite a bit of misinformation being spread about register_globals and its inherent insecurity as well. Most of this information unjustly blames register_globals for poor programming.

Some people, myself included, argue that it is possible to develop secure PHP applications with register_globals enabled. This is absolutely true, although it presents a heightened security risk. A mistake is much more dangerous and likely easier to exploit when register_globals is enabled.

With register_globals enabled, it becomes necessary to filter or initialize all data prior to use, assuming it to be tainted otherwise, because any variable can potentially be overwritten by input. This is a good practice, even when register_globals is disabled.

A common example of a security vulnerability is the assumption that a variable cannot exist without being explicitly set in the code:

  1. <?php
  2.  
  3. if (validate_user()) {
  4.     $validated = TRUE;
  5. }
  6.  
  7. /* ... */
  8.  
  9. if ($validated) {
  10.     /* Sensitive Activity */
  11. }
  12.  
  13. ?>

It is easy enough for an attacker to send validated in the URL and bypass the second check (and anything else that relies on $validated). Of course, this is not possible with register_globals disabled, but it is also not possible with better coding practices. ($validated should be initialized to FALSE.)

With error_reporting set to a sufficiently high level (E_ALL will do the trick), this code generates a notice about an undefined variable. It is a good practice to always initialize variables (and to develop with error_reporting set to E_ALL to help catch yourself when you forget).

Timing

How can initializing variables protect you? Consider a slight modification to the previous example:

  1. <?php
  2.  
  3. $validated = FALSE;
  4.  
  5. if (validate_user()) {
  6.     $validated = TRUE;
  7. }
  8.  
  9. /* ... */
  10.  
  11. ?>

With this code, it is impossible for $validated to be TRUE unless validate_user() returns TRUE (regardless of the register_globals setting). If register_globals is enabled, and this script is accessed with validated=1 in the URL, the sequence of events is as follows:

  1. Request with validated=1 in the URL is sent.
  2. $validated is created with a value of 1.
  3. Your code begins execution.
  4. $validated is set to FALSE.
  5. ...

This indicates your complete control, because by the time the first line of your code is executed, the user is finished sending the request and can do nothing else. Thus, as soon as you initialize a variable, you can be assured that the user cannot directly manipulate it. Use this to your advantage.

Where Is the Trust?

There has to be a certain amount of trust, else your application can do nothing. The key is to understand where you are placing trust. Never trust the client, as the mantra goes, but how can you be sure that you're not?

One way is to rely on the superglobal arrays such as $_GET, $_POST, and $_COOKIE to make the data's origin very clear in your code.

Another good practice is to initialize an array in which you store all data that is safe to be used. This can include data that the application generates itself as well as input from remote sources that has been proven valid.

Design

The culmination of all of the information presented thus far should be used in your application's design. If you fail to design with security in mind, you're doomed to be patching security holes for eternity. One primary concern needs to be input filtering, and a good design makes it easy for developers to distinguish safe data from potentially tainted data.

As mentioned in the previous section, a naming convention can be helpful:

  1. <?php
  2.  
  3. $clean = array();
  4.  
  5. if (valid_color($_POST['color'])) {
  6.     $clean['color'] = $_POST['color'];
  7. }
  8.  
  9. ?>
  10.  

A developer can get into the habit of assuming everything that's not in $clean is tainted. Good habits are valuable.

Another key to a successful design is to make certain that input filtering cannot be missed. Achieving this depends entirely upon your design, but if you initialize your variables and enforce a naming convention, any flaw in your design will cause a variable to be empty rather than have an arbitrary value set by an attacker.

Until Next Time...

Input filtering is possibly the most important topic that I will cover here in Security Corner, and it is likely to be covered again (perhaps with more of a focus on practical implementations). If you design applications with a focus on how data enters the system and is validated, you're far less likely to experience an endless series of security holes.

It is easier to forgive a developer whose input filtering has weaknesses than one who completely fails to filter input at all. Hopefully you now understand the importance of this step and will never skip it.

Until next month, be safe.