About the Author

Chris Shiflett

Hi, I'm Chris, a web developer and a founding member of Analog. I live and work in Brooklyn, NY.


Character Type Functions

An oft-overlooked PHP extension is ctype - a collection of functions that can help you determine whether a string belongs to a particular character class, such as alphanumeric. This extension is built-in as of PHP 4.3.0, so you may not have to do anything special before you can start using it.

The ctype functions are particularly useful for handling $_GET and $_POST data - elements in these superglobal arrays are always strings, and because they are sent by the client, you must treat them with suspicion.

Security-conscious PHP developers frequently use regular expressions to filter external data. While this is still the best approach in many cases, there are a few common character classes that are easier to filter with ctype functions:

A nice side-effect of using ctype functions is that they take locale into account. For example, I consider alphabetic characters to be [A-Za-z], but this isn't true everywhere. In fact, many common European names have characters that are not accounted for in my simplistic pattern.

Here is an example using ctype_alnum() that tests whether $_POST['username'] is alphanumeric:

<?php 
$clean
= array();

if (
ctype_alnum($_POST['username']))
{
$clean['username'] = $_POST['username'];
}
else
{
/* Error */
}
?>

There are plenty of cases where a regular expression is still best, but I think the ctype functions are worth a look.

About This Post

Character Type Functions was posted on Sun, 19 Dec 2004 at 23:13:30 GMT.

5 Comments

1. Chris Shiflett's GravatarChris Shiflett said:

If you don't have the extension, you might be able to use these ctype compatability functions:

http://cvs.php.net/co.php/pear/SQL_Parser/ctype.php

Sun, 19 Dec 2004 at 23:23:43 GMT Link


2. Aaron Wormus's GravatarAaron Wormus said:

In my recent article on string handling in International PHP Magazine, I had originally dedicated a whole section to the ctype library. However, the more I looked at it the less useful they became for anything more than just the most basic of validation.

The main problem with this library is the fact that it is not intended for handling strings. The C stands for Character, and the original functionality is to determine if a specific character is of the defined type.

In PHP we are trying to make it work with strings and the concept doesn't carry over very well.

If you could add a parameter that would check the string for an occurance of that character type it could be much more powerful, because then we could do stuff like:

if (ctype_cntrl($_POST['username'], true)){
    die("Your username contains control characters");
}

The other problem with ctype (and string functions in general) is locale. By default on Unix systems, PHP will use the standard C locale which is [A-Za-z], so with your check Björn, Håkan and plenty of other Scandinavians wouldn't be able to sign in.

All that to say that after all my research all the ctype library got was a brief mention, since IMO it really isn't that useful.

The upside is that you can use POSIX named classes in your preg_* functions (AFAIK, this is undocumented), so the same simplistic example above can be done with:

if (preg_match('/[[:cntrl:]]/', $_POST['username'])){
    die("Your username contains control characters");
}

Now you're not able to give the speed that ctype offers, but at least it's much simpler than trying to work it all out yourself.

Mon, 20 Dec 2004 at 12:07:17 GMT Link


3. Chris Shiflett's GravatarChris Shiflett said:

> If you could add a parameter that would check the string for an occurance

> of that character type it could be much more powerful, because then we

> could do stuff like:

>

> if (ctype_cntrl($_POST['username'], true)){

> die("Your username contains control characters");

> }

That's a blacklist approach to data filtering, which I try to avoid. If you're using a whitelist approach, ctype functions become more useful. Of course, they're only useful in the specific cases where you want to guarantee that every character in a string belongs to a particular class, but these cases do exist.

In fact, I'm glad these functions behave as they do. If I were to test for an alphanumeric as my example demonstrates, I would create a security vulnerability if the function declared something to be alphanumeric when any character in the string is alphanumeric.

> The other problem with ctype (and string functions in general) is locale.

> By default on Unix systems, PHP will use the standard C locale which is

> [A-Za-z]

Yes, that's the default, so using ctype functions won't magically make your applications support multiple locales (a point I almost mentioned). However, you can control their behavior with the locale setting, which is nice, even if it's not the best possible solution. :-)

Mon, 20 Dec 2004 at 16:20:13 GMT Link


4. Aaron Wormus's GravatarAaron Wormus said:

> If I were to test for an alphanumeric as my example demonstrates, I would create a security vulnerability if the function declared something to be alphanumeric when any character in the string is alphanumeric.

I wasn't saying that the function should only return true if the character were present, but that the behaviour could be changed with a switch (notice the second parameter). This would make the functions more flexible and usable for things other than whitelisting.

If you want to whitelist based on one of the provided character sets (and locale) then you're in luck. But seeing that you can use the same character classes intelligently with a preg_match I'll say that ctype is next to useless :)

if (preg_match('/^[[:alpha:][:space:][:punct:]]+$/', $_POST['comment'])){

echo "Look Mom, Whitelisting!

}

The other problem with ctype_* is that they will return true on 0 length strings. So for your example to work as an adequate whitelist you would have to do a strlen to make sure that $_POST['username'] isn't empty.

Again, more work than its worth.

Mon, 20 Dec 2004 at 18:15:39 GMT Link


5. Chris Shiflett's GravatarChris Shiflett said:

> I wasn't saying that the function should only return true if the character

> were present, but that the behaviour could be changed with a switch (notice

> the second parameter). This would make the functions more flexible and usable

> for things other than whitelisting.

I missed the second parameter. That's an idea worth suggesting, since flexibility is always good.

> preg_match('/^[[:alpha:][:space:][:punct:]]+$/', $_POST['comment'])

This illustrates why I think ctype functions are better in certain cases. You cannot match the clarity of a function with a pattern. Unnecessary complexity is an unnecessary risk.

Keep in mind that I state, "There are plenty of cases where a regular expression is still best." Attempting to note such cases doesn't really make me think less of ctype functions anyway. :-)

> The other problem with ctype_* is that they will return true on 0 length

> strings. So for your example to work as an adequate whitelist you would have

> to do a strlen to make sure that $_POST['username'] isn't empty.

A whitelist is a defined set of allowed elements. If any element is not in the whitelist, the entity to which it belongs is considered to be invalid. Considering an entity with no elements to be invalid is not a characteristic of a whitelist approach - it has nothing to do with data filtering, in fact.

Tue, 21 Dec 2004 at 08:55:45 GMT Link


Post A Comment

Personal Details and Comment

Style Guide

Line breaks are converted to paragraphs. Also use:

  • <a href="" title="">text</a>1
  • <em>text</em>
  • <blockquote><p>text</p></blockquote>
  • <code>2  <?php  if ($foo) {      $foo = TRUE;  }  ?></code>
  1. Note: <code> can be used inline (e.g. in paragraphs) or in a block as shown. Include whitespace and newlines in blocks.

Please enter Chris (my first name) below. This is a primitive spam prevention technique, and I apologize for the inconvenience.

Preview and Submit

Upcoming Events

Brooklyn Beta

21 - 22 Oct 2010

At The Invisible Dog, Brooklyn, New York.

New Comments

Mario Arroyo wrote:

The article is really very good and the users comments and external links to another articles jus...

Posted in
Raphael Almeida wrote:

I realy like hiphop music, but this is very crazy! We'll use it in user group PHP conference at ...

Posted in PHP Anthem
Mal wrote:

Having used smarty for many years, this has never been a problem for me, but after building a web...

Posted in PHP Stripping Newlines
Satya wrote:

Thanks for the info. I have posted the news here on my page: http://www.facebook.com/pages/Web-Sc...

Posted in PHP Anthem
John wrote:

Oh, you need to press "save your password".

Posted in Mozilla Account Manager

Browse Comments


Work and Books

Analog Essential PHP Security HTTP Developer's Handbook