About the Author

Chris Shiflett

Hi, I’m Chris: web craftsman, community leader, husband, father, and partner at Fictive Kin.


Google XSS Example

In the comments to my previous blog post, Ivo Jansch asks:

To be able to comprehend how this may affect my website, could you explain how this could be exploited, even though you cannot demonstrate it?

Rather than offer another vague answer, I decided to provide a very simple proof of concept that demonstrates how character encoding inconsistencies can bite you. Google's vulnerability has of course been fixed, but with a simple PHP script, we can reproduce the situation:

<?php 
 
header('Content-Type: text/html; charset=UTF-7'); 
 
$string = "<script>alert('XSS');</script>"; 
$string = mb_convert_encoding($string, 'UTF-7'); 
 
echo htmlentities($string); 
 
?>

If you run this PHP script, you should see a popup window:

Although the output is escaped with htmlentities(), the JavaScript is still executed by the browser.

The example attack is a UTF-7 string (I just use mb_convert_encoding() for this demonstration), and the browser interprets the page as UTF-7 due to the Content-Type header. Internet Explorer makes this assumption automatically (thus, you can remove the explicit header() call), but this example should work in any browser.

Hopefully developers will begin to appreciate the necessity of character encoding consistency. If anyone ever tries to claim that it doesn't matter, you can point them here. :-)

About this post

Google XSS Example was posted on Wed, 21 Dec 2005. If you liked it, follow me on Twitter or share:

40 comments

1.Mike (SpikeZ - Sitepoint) said:

Hi Chris,

Good article and very helpful.

Shows a glaring weakness in many 'secure' sites.

Cheers

Mike

Wed, 21 Dec 2005 at 20:47:24 GMT Link


2.Alex said:

Once upon a time there was a security expert whom-must-not-be-named.

Day for day he consulted others and guide them to prevent XSS. He likes to critism all others and hates it when someone does the same on him. Then he starts to cry and delete comments - so the next time he critism a guy for insecurity he still has a clean record.

Some (lets call them death eater) try to protect him.

But like in each fairy story, some day there will be a happy end - and he-who-must-not-be-named will fall.....

Wed, 21 Dec 2005 at 21:19:55 GMT Link


3.Ivo Jansch said:

Thanks Chris, the example is very clear now :)

Wed, 21 Dec 2005 at 21:56:14 GMT Link


4.Ilia Alshanetsky said:

Actually the problem is not limited to Internet Explorer, Mozilla Firefox 1.5 exibits the exact same behaviour.

If you enable automatic character set detection either browser will trigger the XSS without the call to the header() function. The difference is that in Firefox to trigger the header-less problem the auto-detection needs to be configured to detect utf-7. If it is not, then the exploit does not happen.

Wed, 21 Dec 2005 at 23:35:53 GMT Link


5.Josh Dechant said:

Try as I might, this exploit does not work in Safari. It does in Firefox Mac, but Safari won't have it and just prints out "+ADw-script+AD4-alert('XSS')+ADsAPA-/script+AD4-"

Thu, 22 Dec 2005 at 14:18:45 GMT Link


6.Chris Shiflett said:

Hi Josh,

I just tried in Safari and got the same results.

If you go to View > Text Encoding, you'll see that UTF-7 is not an encoding that Safari supports, so that's probably why.

Thu, 22 Dec 2005 at 14:33:14 GMT Link


7.Josh Dechant said:

Chris, yes after I posted I noticed that Safari does not support UTF-7. But if you think about it, it really does make sense for a browser to not support UTF-7 since it is largely meant for email. Though Mail.app doesn't have UTF-7 either, and nor does TextEdit or any other Apple App that I can see, so it seems that Apple has made a blanket decision to not support UTF-7 for one reason or another.

Ironically enough though, removing the explicit header, and the dead IE 5 Mac does not automatically set the encoding to UTF-7... And unless you really dig into FireFox, it won't autodetect UTF-7 either, so while I agree we should be consistent in our encoding, this seems to equally be an XSS and IE PC exploit.

Thu, 22 Dec 2005 at 14:49:20 GMT Link


8.DewChugr said:

As someone fairly new to PHP and web programming I have read several of your articles and I have to say they are filled with great information. It's nice that you share this stuff with everyone. I don't really get all this UTF stuff yet so I'll google around and try to learn some more.

Thanks

Thu, 22 Dec 2005 at 22:26:17 GMT Link


9.Chris Shiflett said:

Thanks for the kind words. I really appreciate it. :-)

Andrei Zmievski is working on PHP's Unicode support and has given some good talks on the topic:

http://www.gravitonic.com/talks/

Wikipedia has a pretty good description of UTF-7:

http://en.wikipedia.org/wiki/UTF-7

Skip down to the description, as I think it's the most informative.

Hope that helps!

Thu, 22 Dec 2005 at 23:29:51 GMT Link


10.DewChugr said:

Thanks for the links, very helpful.

btw, this blog/commenting system. Did you code it or is it a package? It's very slick...

Fri, 23 Dec 2005 at 14:54:53 GMT Link


11.Harry Fuecks said:

Interesting.

UTF-7 would be considered "valid" by common techniques used to validate encodings e.g. this regex: http://www.w3.org/International/questions/qa-forms-utf-8 - UTF-7 would pass, being in the ASCII range.

So we're saying the only time this is a risk is if it's left up to the browser to guess the encoding? I.e. make sure you declare the charset with a header or a meta tag?

Sun, 25 Dec 2005 at 01:00:42 GMT Link


12.joh said:

if you want to learn about Unicode, you should start here:

"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"

http://joelonsoftware.com/articles/Unicode.html

Sun, 25 Dec 2005 at 08:51:55 GMT Link


13.Miggy said:

That was show on http://ha.ckers.org/xss.html a while back too.

Mon, 02 Jan 2006 at 00:10:45 GMT Link


14.Paul Davey said:

<p>Is it because the htmlentities function is not compatible with utf-7 and hence does not alter the HTML? I did this for example (after your code):</p>

<pre>if (htmlentities($string)== $string) {

echo "NO CHANGE!";

}</pre>

<p>And it displayed NO CHANGE</p>

Sun, 15 Jan 2006 at 16:36:52 GMT Link


15.Chris Shiflett said:

Harry, I would add one more thing - we should indicate the charset in both our Content-Type headers and in our htmlentities() calls.

The previous post might make this clearer and provides an example:

http://shiflett.org/archive/177

Paul, this example works because htmlentities() assumes ISO-8859-1 by default, but the content is UTF-7. This mismatch causes htmlentities() to misinterpret characters.

We want to make sure our escaping functions and the remote systems to which we're sending data interpret data consistently, otherwise vulnerabilities like this are possible.

Sun, 15 Jan 2006 at 16:54:05 GMT Link


16.Luis said:

the example is sui generis because UTF-7 is not supported by htmlentities(). passing UTF-8 to it avoids the popup but you see no text either (but looking at the source you see something like the safari output above). on the other hand if the encoding had been UTF-8, not passing it to htmlentities() would not cause a popup either, probably because for this example UTF-8 and ISO-8859-1 are the same. so the example is good but not perfect.

Thu, 02 Feb 2006 at 16:52:59 GMT Link


17.Chris Shiflett said:

Hi Luis,

The example is only meant to reproduce Google's XSS vulnerability and highlight the importance of character encoding consistency. It's not a contrived example.

The idea isn't to use UTF-7 but to show that it's worth being explicit about which character encoding you're using. By specifying this in the Content-Type header and the htmlentities() function call, you're protected from these types of vulnerabilities.

Thu, 02 Feb 2006 at 17:20:14 GMT Link


18.Steph said:

In regards to the comment from "Alex". I have no idea who you are but wtf are you doing posting around your trivial life problems with someone on the comments. I think it would be best for you is the Blogger deleted your comment and you stick to something on topic regarding character encoding and xss issues and not some "I want to sound smart with my metaphors" useless comments"

If you do have a valid point regarding something well then post a link to somewhere where you might have a sensical discussion of the issues not some fairy tale garbage.

I'd like to point out that I am all for your useless comments being removed they serve no purpose on this blog.

Also note that I will not be offended if the blogger does not wish this comment section to become a war of words and deletes my comment. Its his blog and he should make the decisions based on what he thinks is best for his readers.

Wed, 01 Mar 2006 at 05:04:53 GMT Link


19.Nate Klaiber said:

Steph,

I wouldnt get too worked up over 'Alex' - people like this, who hide behind the Internet, lose all respect and credibility anyway. So, the whiny attitude and comment approach he has taken holds no water, and deserves to be removed.

Cmon, if you are going to act tough - at least post some contact information. Some people need to grow up, period. I say delete the comments, they are useless and provide nothing of worth here - He just needs to go back to hardened-php...er...um....

Back on topic - I have tried this (As others) in Safari and have been unable to duplicate any results (but I understand some may have been fixed, etc). I still want to see more of this with working examples - so I will test it later on a local machine. I understand what is being said, but sometimes its easier to see HOW it could maliciously affect you if ignored.

Thanks for the great information....

Wed, 01 Mar 2006 at 15:25:49 GMT Link


20.Chris Shiflett said:

I don't like to delete comments, so I try to only do so when they're spam, off-topic, or flagrant.

Nate, you won't be able to try this in Safari, because it doesn't support UTF-7. In browsers that automatically detect the encoding, you can remove the header() call, which more closely resembles the problem Google was having.

Wed, 01 Mar 2006 at 15:53:54 GMT Link


21.Steven Roddis said:

Please note table.2:

http://www.php.net/htmlentities

It shows the character sets that are supported in PHP 4.3.0 and later. UTF-7 is not one, therefor it is not going to escape it.

So the problem is that the developer did not understand which character sets are supported and which aren't.

Steven

Wed, 19 Apr 2006 at 06:31:36 GMT Link


22.its not important to know my name said:

What exactly was the worst case possibility of google's vulnerabilty. How could it have been used to to bypass security?

Sat, 22 Apr 2006 at 17:43:51 GMT Link


23.Michael said:

사설 번역/음성사설 번역/음성

Fri, 04 Aug 2006 at 00:17:16 GMT Link


24.Mikispag said:

Great tip! Thanks!

Mon, 27 Nov 2006 at 15:35:27 GMT Link


25.SEO Blog said:

Very good informations to get a secure website. I will check this. Thanks!

Sat, 16 Dec 2006 at 18:02:48 GMT Link


26.Bourse said:

I have never imagined how character encoding could be that important! I am not an expert in web programming and I am new to cross site scripting subject (actually I am currently developing a website and this is the first time I am making the coding entirely on my own) so a big thanks from me for sharing such an useful information on your site. I have already read several articles here that really came into use.

Sun, 11 Mar 2007 at 15:43:25 GMT Link


27.Tereska said:

Replace last line with this one:

echo htmlentities($string, ENT_QUOTES);

and this hack will not work....

learn PHP guys ;))

Tue, 29 May 2007 at 01:06:05 GMT Link


28.Chris Shiflett said:

Hi Tereska,

If you think this problem has to do with whether quotes are escaped, then you're the one with some learning to do.

Because you failed to indicate the character encoding, your example is vulnerable to XSS. I'm surprised you made this particular mistake, because it's the focal point of this post.

Tue, 29 May 2007 at 01:39:39 GMT Link


29.Tereska said:

Sorry for my E ;)

Chris, I didnt want to to offend anyone so I'm sorry for my "learn PHP" sentence :) it's just misunderstanding... :)

I'm really concern about this RSS example and I've tried to do something to make this hack useless...

I think the KEY in this example is htmlentities 3rd parameter -> [, string $charset]. If I'm wrong just correct me.

Thanks! Seeyaa!

Tue, 29 May 2007 at 23:03:05 GMT Link


30.Daniel said:

What if you convert to UTF-8 (or your application encoding) the submited variables before processing?

if(!is_myEncode($var)) Encode($var);

Thus, you will have consistent values.

Wed, 30 May 2007 at 06:23:40 GMT Link


31.Thijs Wijnmaalen said:

Does anybody know if the Smarty modifier 'escape' is vulnerable to this attack?

Wed, 20 Jun 2007 at 15:23:53 GMT Link


32.Jim said:

Briliant article! Thanks!

Thu, 22 May 2008 at 14:22:10 GMT Link


33.Miguel Vazquez Gocobachi said:

Hi Chris,

thanks for the article, but I have a simple question that maybe you are resolved before. What charset is the best for php scripts? is it utf-8 the best choice?

Thanks!

Sat, 15 Aug 2009 at 03:17:04 GMT Link


34.XSS said:

Hi,

i liked your article very much,

and i would also like to point out another article on this blog ->

http://hackerthedude.blogspot.com/2...s-phishing.html

Thanks

Thu, 17 Sep 2009 at 18:57:27 GMT Link


35.Vahagn said:

Hello Chris,

as I understand another possible solution to this, is to specify the string encoding when you call htmlentities():

htmlentities($string, ENT_QUOTES, 'UTF-8');

Am I right ?

Wed, 11 Nov 2009 at 13:00:20 GMT Link


36.Chris Shiflett said:

Hi Vahagn,

Using ENT_QUOTES does render this example useless, but it is not a complete solution. There are many XSS exploits that do not require quotes.

The real solution is to call htmlentities() as you describe, but also to make sure the character encoding is indicated in the Content-Type header:

Content-Type: text/html; charset=UTF-8

If you maintain a consistent character encoding throughout, you don't have to worry about this problem.

Thu, 19 Nov 2009 at 21:19:46 GMT Link


37.Solexy said:

thx, good article

Mon, 17 May 2010 at 05:49:00 GMT Link


38.Sky said:

Hello

Why not use htmlentities($var, ENT_QUOTES, 'UTF-8');

Its not perfect and a consistent charset policy is much better but this can be usefull.

Thu, 22 Jul 2010 at 07:37:31 GMT Link


39.Chris Shiflett said:

Hi Sky,

Using htmlentities($var, ENT_QUOTES, 'UTF-8') is a good practice, but it doesn't solve the problem entirely. If you try that with this example, you'll notice the XSS doesn't work, but that's only because this particular example uses quotes. There are XSS attacks that do not rely on quotes, and those will still work. View source to see what I mean, or try the example using htmlentities($var, ENT_COMPAT, 'UTF-8'), and you'll see that it still works.

Hope that clarifies things. :-)

Thu, 22 Jul 2010 at 19:05:13 GMT Link


40.Stock Market Today said:

In this particular case, perhaps that’s true, but the more often this sort of thing happens.

best ed treatment

Sun, 12 Jun 2016 at 09:01:33 GMT Link


Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.