Google's XSS Vulnerability

21 Dec 2005

The recent cross-site scripting (XSS) vulnerability discovered in Google perfectly illustrates why character encoding matters. This example demonstrates how to use PHP's htmlentities() function with the optional third argument that indicates the character encoding:

  1. <?php
  2.  
  3. $html = array();
  4.  
  5. $html['username'] = htmlentities($clean['username'],
  6.                                  ENT_QUOTES,
  7.                                  'UTF-8');
  8.  
  9. echo "<p>Welcome back, {$html['username']}.</p>";
  10.  
  11. ?>

The example uses UTF-8, so this should be indicated in the Content-Type header:

  1. Content-Type: text/html; charset=UTF-8

Researchers at Watchfire realized that Google does not indicate the character encoding. They also realized that you can visit a URL such as the following to get data that you send returned in the content of the response:

http://google.com/url?EVIL

You will see the following:

Forbidden

Your client does not have permission to get URL /url?EVIL from this server.

Google fails to handle malicious attacks that use UTF-7, so all an attacker must do is target a browser that will interpret Google's response as a UTF-7 resource. Because Google does not indicate the character encoding in its Content-Type entity header, this is possible.

Unfortunately for Internet Explorer users (and Google), there is an auto select option for encoding that, if set, will interpret a resource as UTF-7 if it finds a UTF-7 character in the first 4096 bytes. Because Google's response is so small, the danger is clear.

The moral of the story is that you should always ensure character encoding consistency between your escaping function and the remote system to which you're sending data. In other words, specify the character encoding in htmlentities(), use mysql_real_escape_string() (which handles this for you), etc.

Google corrected this flaw earlier this month.