Character Encoding and XSS

29 May 2007

While lamenting Ronaldinho's red card and writing an overdue column for php|architect this weekend, I took a break to read Kevin Yank's latest post, Good and Bad PHP Code.

In the post, he provides a few useful PHP interview questions, including some questions from Yahoo as well as his personal favorite:

In your mind, what are the differences between good PHP code and bad PHP code?

He explains that good PHP code should be:

He also takes an example of bad PHP code and makes it better, producing this:

  1. <?php
  3. if (isset($_GET['query'])) {
  4.     echo '<p>Search results for query: ',
  5.          htmlspecialchars($_GET['query'], ENT_QUOTES),
  6.          '.</p>';
  7. }
  9. ?>

In the comments, many additional improvements have been suggested, but there's one that has yet to be mentioned. When using htmlspecialchars() without specifying the character encoding, XSS attacks that use UTF-7 are possible. If you've been reading my blog for a while, you can probably put the pieces together yourself, so feel free to give it a go. The only obstacle is the fact that ENT_QUOTES causes all quotes to be escaped, and quotes are consistent between UTF-7 and ISO-8859-1, so you need an example exploit that doesn't use them:

  1. <script src=>

Web standards pedants might cringe, but this works in most browsers, despite the missing quotes, and the JavaScript returned by xss.js executes within the context of the current page.

To try this out, just save the example PHP code somewhere, then visit it with your browser, including the following value in the query string:

  1. ?

This only works in browsers that automatically detect the character encoding, but you can mimic the situation by manually setting your browser to use UTF-7 or by sending a Content-Type header that does the same thing:

  1. <?php
  3. header('Content-Type: text/html; charset=UTF-7');
  5. ?>