About the Author

Chris Shiflett

Chris Shiflett is an author and speaker who leads the web application security practice at OmniTI.


Character Type Functions

An oft-overlooked PHP extension is ctype - a collection of functions that can help you determine whether a string belongs to a particular character class, such as alphanumeric. This extension is built-in as of PHP 4.3.0, so you may not have to do anything special before you can start using it.

The ctype functions are particularly useful for handling $_GET and $_POST data - elements in these superglobal arrays are always strings, and because they are sent by the client, you must treat them with suspicion.

Security-conscious PHP developers frequently use regular expressions to filter external data. While this is still the best approach in many cases, there are a few common character classes that are easier to filter with ctype functions:

A nice side-effect of using ctype functions is that they take locale into account. For example, I consider alphabetic characters to be [A-Za-z], but this isn't true everywhere. In fact, many common European names have characters that are not accounted for in my simplistic pattern.

Here is an example using ctype_alnum() that tests whether $_POST['username'] is alphanumeric:

<?php 
$clean
= array();

if (
ctype_alnum($_POST['username']))
{
$clean['username'] = $_POST['username'];
}
else
{
/* Error */
}
?>

There are plenty of cases where a regular expression is still best, but I think the ctype functions are worth a look.

About This Post

Character Type Functions was posted on Sun, 19 Dec 2004 at 23:13:30 GMT.

5 Comments

1. Chris Shiflett's GravatarChris Shiflett said:

If you don't have the extension, you might be able to use these ctype compatability functions:

http://cvs.php.net/co.php/pear/SQL_Parser/ctype.php

Sun, 19 Dec 2004 at 23:23:43 GMT Link


2. Aaron Wormus's GravatarAaron Wormus said:

In my recent article on string handling in International PHP Magazine, I had originally dedicated a whole section to the ctype library. However, the more I looked at it the less useful they became for anything more than just the most basic of validation.

The main problem with this library is the fact that it is not intended for handling strings. The C stands for Character, and the original functionality is to determine if a specific character is of the defined type.

In PHP we are trying to make it work with strings and the concept doesn't carry over very well.

If you could add a parameter that would check the string for an occurance of that character type it could be much more powerful, because then we could do stuff like:

if (ctype_cntrl($_POST['username'], true)){
    die("Your username contains control characters");
}

The other problem with ctype (and string functions in general) is locale. By default on Unix systems, PHP will use the standard C locale which is [A-Za-z], so with your check Björn, Håkan and plenty of other Scandinavians wouldn't be able to sign in.

All that to say that after all my research all the ctype library got was a brief mention, since IMO it really isn't that useful.

The upside is that you can use POSIX named classes in your preg_* functions (AFAIK, this is undocumented), so the same simplistic example above can be done with:

if (preg_match('/[[:cntrl:]]/', $_POST['username'])){
    die("Your username contains control characters");
}

Now you're not able to give the speed that ctype offers, but at least it's much simpler than trying to work it all out yourself.

Mon, 20 Dec 2004 at 12:07:17 GMT Link


3. Chris Shiflett's GravatarChris Shiflett said:

> If you could add a parameter that would check the string for an occurance

> of that character type it could be much more powerful, because then we

> could do stuff like:

>

> if (ctype_cntrl($_POST['username'], true)){

> die("Your username contains control characters");

> }

That's a blacklist approach to data filtering, which I try to avoid. If you're using a whitelist approach, ctype functions become more useful. Of course, they're only useful in the specific cases where you want to guarantee that every character in a string belongs to a particular class, but these cases do exist.

In fact, I'm glad these functions behave as they do. If I were to test for an alphanumeric as my example demonstrates, I would create a security vulnerability if the function declared something to be alphanumeric when any character in the string is alphanumeric.

> The other problem with ctype (and string functions in general) is locale.

> By default on Unix systems, PHP will use the standard C locale which is

> [A-Za-z]

Yes, that's the default, so using ctype functions won't magically make your applications support multiple locales (a point I almost mentioned). However, you can control their behavior with the locale setting, which is nice, even if it's not the best possible solution. :-)

Mon, 20 Dec 2004 at 16:20:13 GMT Link


4. Aaron Wormus's GravatarAaron Wormus said:

> If I were to test for an alphanumeric as my example demonstrates, I would create a security vulnerability if the function declared something to be alphanumeric when any character in the string is alphanumeric.

I wasn't saying that the function should only return true if the character were present, but that the behaviour could be changed with a switch (notice the second parameter). This would make the functions more flexible and usable for things other than whitelisting.

If you want to whitelist based on one of the provided character sets (and locale) then you're in luck. But seeing that you can use the same character classes intelligently with a preg_match I'll say that ctype is next to useless :)

if (preg_match('/^[[:alpha:][:space:][:punct:]]+$/', $_POST['comment'])){

echo "Look Mom, Whitelisting!

}

The other problem with ctype_* is that they will return true on 0 length strings. So for your example to work as an adequate whitelist you would have to do a strlen to make sure that $_POST['username'] isn't empty.

Again, more work than its worth.

Mon, 20 Dec 2004 at 18:15:39 GMT Link


5. Chris Shiflett's GravatarChris Shiflett said:

> I wasn't saying that the function should only return true if the character

> were present, but that the behaviour could be changed with a switch (notice

> the second parameter). This would make the functions more flexible and usable

> for things other than whitelisting.

I missed the second parameter. That's an idea worth suggesting, since flexibility is always good.

> preg_match('/^[[:alpha:][:space:][:punct:]]+$/', $_POST['comment'])

This illustrates why I think ctype functions are better in certain cases. You cannot match the clarity of a function with a pattern. Unnecessary complexity is an unnecessary risk.

Keep in mind that I state, "There are plenty of cases where a regular expression is still best." Attempting to note such cases doesn't really make me think less of ctype functions anyway. :-)

> The other problem with ctype_* is that they will return true on 0 length

> strings. So for your example to work as an adequate whitelist you would have

> to do a strlen to make sure that $_POST['username'] isn't empty.

A whitelist is a defined set of allowed elements. If any element is not in the whitelist, the entity to which it belongs is considered to be invalid. Considering an entity with no elements to be invalid is not a characteristic of a whitelist approach - it has nothing to do with data filtering, in fact.

Tue, 21 Dec 2004 at 08:55:45 GMT Link


Post A Comment

Personal Details and Comment

Style Guide

Line breaks are converted to paragraphs. Also use:

  • <a href="" title="">text</a>1
  • <em>text</em>
  • <blockquote><p>text</p></blockquote>
  • <code>2  <?php  if ($foo) {      $foo = TRUE;  }  ?></code>
  1. Note: <code> can be used inline (e.g. in paragraphs) or in a block as shown. Include whitespace and newlines in blocks.

Please enter Chris (my first name) below. This is a primitive spam prevention technique, and I apologize for the inconvenience.

Preview and Submit

Upcoming Talks

php|works / PyWorks

12 - 14 Nov 2008

At Sheraton Gateway Hotel Atlanta Airport, Atlanta, Georgia.

New Comments

Dave wrote:

Hi Seth, I'm experiencing exactly the same problem as you have. Have you fixed it? How?

Posted in
Matt Robinson wrote:

Wotcha Chris, thanks for the tip about headers in the web inspector, I hadn't noticed them! (Actu...

Posted in Inspecting and Hacking HTTP
Stelian Mocanita wrote:

Not much I know so far, didn't get far with debugging it to get as far as http headers but I know...

Posted in Facebook Worm
Chris Shiflett wrote:

Yes, good point. The message this worm sends is really just a phishing attack, and Facebook is do...

Posted in Facebook Worm
yawnmoth wrote:

Given that Samy required no action on the users part, above and beyond viewing an infected users ...

Posted in Facebook Worm

Browse Comments