About the Author

Chris Shiflett

Hi, I’m Chris: entrepreneur, community leader, husband, and father. I live and work in Boulder, CO.

Google XSS Example

In the comments to my previous blog post, Ivo Jansch asks:

To be able to comprehend how this may affect my website, could you explain how this could be exploited, even though you cannot demonstrate it?

Rather than offer another vague answer, I decided to provide a very simple proof of concept that demonstrates how character encoding inconsistencies can bite you. Google's vulnerability has of course been fixed, but with a simple PHP script, we can reproduce the situation:

header('Content-Type: text/html; charset=UTF-7'); 
$string = "<script>alert('XSS');</script>"; 
$string = mb_convert_encoding($string, 'UTF-7'); 
echo htmlentities($string); 

If you run this PHP script, you should see a popup window:

Although the output is escaped with htmlentities(), the JavaScript is still executed by the browser.

The example attack is a UTF-7 string (I just use mb_convert_encoding() for this demonstration), and the browser interprets the page as UTF-7 due to the Content-Type header. Internet Explorer makes this assumption automatically (thus, you can remove the explicit header() call), but this example should work in any browser.

Hopefully developers will begin to appreciate the necessity of character encoding consistency. If anyone ever tries to claim that it doesn't matter, you can point them here. :-)

About this post

Google XSS Example was posted on Wed, 21 Dec 2005. If you liked it, follow me on Twitter or share:


1.Mike (SpikeZ - Sitepoint) said:

Hi Chris,

Good article and very helpful.

Shows a glaring weakness in many 'secure' sites.



Wed, 21 Dec 2005 at 20:47:24 GMT Link

2.Alex said:

Once upon a time there was a security expert whom-must-not-be-named.

Day for day he consulted others and guide them to prevent XSS. He likes to critism all others and hates it when someone does the same on him. Then he starts to cry and delete comments - so the next time he critism a guy for insecurity he still has a clean record.

Some (lets call them death eater) try to protect him.

But like in each fairy story, some day there will be a happy end - and he-who-must-not-be-named will fall.....

Wed, 21 Dec 2005 at 21:19:55 GMT Link

3.Ivo Jansch said:

Thanks Chris, the example is very clear now :)

Wed, 21 Dec 2005 at 21:56:14 GMT Link

4.Ilia Alshanetsky said:

Actually the problem is not limited to Internet Explorer, Mozilla Firefox 1.5 exibits the exact same behaviour.

If you enable automatic character set detection either browser will trigger the XSS without the call to the header() function. The difference is that in Firefox to trigger the header-less problem the auto-detection needs to be configured to detect utf-7. If it is not, then the exploit does not happen.

Wed, 21 Dec 2005 at 23:35:53 GMT Link

5.Josh Dechant said:

Try as I might, this exploit does not work in Safari. It does in Firefox Mac, but Safari won't have it and just prints out "+ADw-script+AD4-alert('XSS')+ADsAPA-/script+AD4-"

Thu, 22 Dec 2005 at 14:18:45 GMT Link

6.Chris Shiflett said:

Hi Josh,

I just tried in Safari and got the same results.

If you go to View > Text Encoding, you'll see that UTF-7 is not an encoding that Safari supports, so that's probably why.

Thu, 22 Dec 2005 at 14:33:14 GMT Link

7.Josh Dechant said:

Chris, yes after I posted I noticed that Safari does not support UTF-7. But if you think about it, it really does make sense for a browser to not support UTF-7 since it is largely meant for email. Though Mail.app doesn't have UTF-7 either, and nor does TextEdit or any other Apple App that I can see, so it seems that Apple has made a blanket decision to not support UTF-7 for one reason or another.

Ironically enough though, removing the explicit header, and the dead IE 5 Mac does not automatically set the encoding to UTF-7... And unless you really dig into FireFox, it won't autodetect UTF-7 either, so while I agree we should be consistent in our encoding, this seems to equally be an XSS and IE PC exploit.

Thu, 22 Dec 2005 at 14:49:20 GMT Link

8.DewChugr said:

As someone fairly new to PHP and web programming I have read several of your articles and I have to say they are filled with great information. It's nice that you share this stuff with everyone. I don't really get all this UTF stuff yet so I'll google around and try to learn some more.


Thu, 22 Dec 2005 at 22:26:17 GMT Link

9.Chris Shiflett said:

Thanks for the kind words. I really appreciate it. :-)

Andrei Zmievski is working on PHP's Unicode support and has given some good talks on the topic:


Wikipedia has a pretty good description of UTF-7:


Skip down to the description, as I think it's the most informative.

Hope that helps!

Thu, 22 Dec 2005 at 23:29:51 GMT Link

10.DewChugr said:

Thanks for the links, very helpful.

btw, this blog/commenting system. Did you code it or is it a package? It's very slick...

Fri, 23 Dec 2005 at 14:54:53 GMT Link

11.Harry Fuecks said:


UTF-7 would be considered "valid" by common techniques used to validate encodings e.g. this regex: http://www.w3.org/International/questions/qa-forms-utf-8 - UTF-7 would pass, being in the ASCII range.

So we're saying the only time this is a risk is if it's left up to the browser to guess the encoding? I.e. make sure you declare the charset with a header or a meta tag?

Sun, 25 Dec 2005 at 01:00:42 GMT Link

12.joh said:

if you want to learn about Unicode, you should start here:

"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"


Sun, 25 Dec 2005 at 08:51:55 GMT Link

13.Miggy said:

That was show on http://ha.ckers.org/xss.html a while back too.

Mon, 02 Jan 2006 at 00:10:45 GMT Link

14.Paul Davey said:

<p>Is it because the htmlentities function is not compatible with utf-7 and hence does not alter the HTML? I did this for example (after your code):</p>

<pre>if (htmlentities($string)== $string) {

echo "NO CHANGE!";


<p>And it displayed NO CHANGE</p>

Sun, 15 Jan 2006 at 16:36:52 GMT Link

15.Chris Shiflett said:

Harry, I would add one more thing - we should indicate the charset in both our Content-Type headers and in our htmlentities() calls.

The previous post might make this clearer and provides an example:


Paul, this example works because htmlentities() assumes ISO-8859-1 by default, but the content is UTF-7. This mismatch causes htmlentities() to misinterpret characters.

We want to make sure our escaping functions and the remote systems to which we're sending data interpret data consistently, otherwise vulnerabilities like this are possible.

Sun, 15 Jan 2006 at 16:54:05 GMT Link

16.Luis said:

the example is sui generis because UTF-7 is not supported by htmlentities(). passing UTF-8 to it avoids the popup but you see no text either (but looking at the source you see something like the safari output above). on the other hand if the encoding had been UTF-8, not passing it to htmlentities() would not cause a popup either, probably because for this example UTF-8 and ISO-8859-1 are the same. so the example is good but not perfect.

Thu, 02 Feb 2006 at 16:52:59 GMT Link

17.Chris Shiflett said:

Hi Luis,

The example is only meant to reproduce Google's XSS vulnerability and highlight the importance of character encoding consistency. It's not a contrived example.

The idea isn't to use UTF-7 but to show that it's worth being explicit about which character encoding you're using. By specifying this in the Content-Type header and the htmlentities() function call, you're protected from these types of vulnerabilities.

Thu, 02 Feb 2006 at 17:20:14 GMT Link

18.Steph said:

In regards to the comment from "Alex". I have no idea who you are but wtf are you doing posting around your trivial life problems with someone on the comments. I think it would be best for you is the Blogger deleted your comment and you stick to something on topic regarding character encoding and xss issues and not some "I want to sound smart with my metaphors" useless comments"

If you do have a valid point regarding something well then post a link to somewhere where you might have a sensical discussion of the issues not some fairy tale garbage.

I'd like to point out that I am all for your useless comments being removed they serve no purpose on this blog.

Also note that I will not be offended if the blogger does not wish this comment section to become a war of words and deletes my comment. Its his blog and he should make the decisions based on what he thinks is best for his readers.

Wed, 01 Mar 2006 at 05:04:53 GMT Link

19.Nate Klaiber said:


I wouldnt get too worked up over 'Alex' - people like this, who hide behind the Internet, lose all respect and credibility anyway. So, the whiny attitude and comment approach he has taken holds no water, and deserves to be removed.

Cmon, if you are going to act tough - at least post some contact information. Some people need to grow up, period. I say delete the comments, they are useless and provide nothing of worth here - He just needs to go back to hardened-php...er...um....

Back on topic - I have tried this (As others) in Safari and have been unable to duplicate any results (but I understand some may have been fixed, etc). I still want to see more of this with working examples - so I will test it later on a local machine. I understand what is being said, but sometimes its easier to see HOW it could maliciously affect you if ignored.

Thanks for the great information....

Wed, 01 Mar 2006 at 15:25:49 GMT Link

20.Chris Shiflett said:

I don't like to delete comments, so I try to only do so when they're spam, off-topic, or flagrant.

Nate, you won't be able to try this in Safari, because it doesn't support UTF-7. In browsers that automatically detect the encoding, you can remove the header() call, which more closely resembles the problem Google was having.

Wed, 01 Mar 2006 at 15:53:54 GMT Link

21.Steven Roddis said:

Please note table.2:


It shows the character sets that are supported in PHP 4.3.0 and later. UTF-7 is not one, therefor it is not going to escape it.

So the problem is that the developer did not understand which character sets are supported and which aren't.


Wed, 19 Apr 2006 at 06:31:36 GMT Link

22.its not important to know my name said:

What exactly was the worst case possibility of google's vulnerabilty. How could it have been used to to bypass security?

Sat, 22 Apr 2006 at 17:43:51 GMT Link

23.Michael said:

사설 번역/음성사설 번역/음성

Fri, 04 Aug 2006 at 00:17:16 GMT Link

24.Mikispag said:

Great tip! Thanks!

Mon, 27 Nov 2006 at 15:35:27 GMT Link

25.SEO Blog said:

Very good informations to get a secure website. I will check this. Thanks!

Sat, 16 Dec 2006 at 18:02:48 GMT Link

26.Bourse said:

I have never imagined how character encoding could be that important! I am not an expert in web programming and I am new to cross site scripting subject (actually I am currently developing a website and this is the first time I am making the coding entirely on my own) so a big thanks from me for sharing such an useful information on your site. I have already read several articles here that really came into use.

Sun, 11 Mar 2007 at 15:43:25 GMT Link

27.Tereska said:

Replace last line with this one:

echo htmlentities($string, ENT_QUOTES);

and this hack will not work....

learn PHP guys ;))

Tue, 29 May 2007 at 01:06:05 GMT Link

28.Chris Shiflett said:

Hi Tereska,

If you think this problem has to do with whether quotes are escaped, then you're the one with some learning to do.

Because you failed to indicate the character encoding, your example is vulnerable to XSS. I'm surprised you made this particular mistake, because it's the focal point of this post.

Tue, 29 May 2007 at 01:39:39 GMT Link

29.Tereska said:

Sorry for my E ;)

Chris, I didnt want to to offend anyone so I'm sorry for my "learn PHP" sentence :) it's just misunderstanding... :)

I'm really concern about this RSS example and I've tried to do something to make this hack useless...

I think the KEY in this example is htmlentities 3rd parameter -> [, string $charset]. If I'm wrong just correct me.

Thanks! Seeyaa!

Tue, 29 May 2007 at 23:03:05 GMT Link

30.Daniel said:

What if you convert to UTF-8 (or your application encoding) the submited variables before processing?

if(!is_myEncode($var)) Encode($var);

Thus, you will have consistent values.

Wed, 30 May 2007 at 06:23:40 GMT Link

31.Thijs Wijnmaalen said:

Does anybody know if the Smarty modifier 'escape' is vulnerable to this attack?

Wed, 20 Jun 2007 at 15:23:53 GMT Link

32.Jim said:

Briliant article! Thanks!

Thu, 22 May 2008 at 14:22:10 GMT Link

33.Miguel Vazquez Gocobachi said:

Hi Chris,

thanks for the article, but I have a simple question that maybe you are resolved before. What charset is the best for php scripts? is it utf-8 the best choice?


Sat, 15 Aug 2009 at 03:17:04 GMT Link

34.XSS said:


i liked your article very much,

and i would also like to point out another article on this blog ->



Thu, 17 Sep 2009 at 18:57:27 GMT Link

35.Vahagn said:

Hello Chris,

as I understand another possible solution to this, is to specify the string encoding when you call htmlentities():

htmlentities($string, ENT_QUOTES, 'UTF-8');

Am I right ?

Wed, 11 Nov 2009 at 13:00:20 GMT Link

36.Chris Shiflett said:

Hi Vahagn,

Using ENT_QUOTES does render this example useless, but it is not a complete solution. There are many XSS exploits that do not require quotes.

The real solution is to call htmlentities() as you describe, but also to make sure the character encoding is indicated in the Content-Type header:

Content-Type: text/html; charset=UTF-8

If you maintain a consistent character encoding throughout, you don't have to worry about this problem.

Thu, 19 Nov 2009 at 21:19:46 GMT Link

37.Solexy said:

thx, good article

Mon, 17 May 2010 at 05:49:00 GMT Link

38.Sky said:


Why not use htmlentities($var, ENT_QUOTES, 'UTF-8');

Its not perfect and a consistent charset policy is much better but this can be usefull.

Thu, 22 Jul 2010 at 07:37:31 GMT Link

39.Chris Shiflett said:

Hi Sky,

Using htmlentities($var, ENT_QUOTES, 'UTF-8') is a good practice, but it doesn't solve the problem entirely. If you try that with this example, you'll notice the XSS doesn't work, but that's only because this particular example uses quotes. There are XSS attacks that do not rely on quotes, and those will still work. View source to see what I mean, or try the example using htmlentities($var, ENT_COMPAT, 'UTF-8'), and you'll see that it still works.

Hope that clarifies things. :-)

Thu, 22 Jul 2010 at 19:05:13 GMT Link

Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.