About the Author

Chris Shiflett

Chris Shiflett is an author and speaker who leads the web application security practice at OmniTI.


Convert Smart Quotes with PHP

A question that seems to come up pretty frequently on various PHP mailing lists is how to convert "smart quotes" to real quotes. Dan Convissor provided a simple example that performs such a conversion a year or two ago on the NYPHP mailing list. I've modified it slightly to fit my own style preferences:

<?php 

function convert_smart_quotes($string)
{
    
$search = array(chr(145),
                    
chr(146),
                    
chr(147),
                    
chr(148),
                    
chr(151));

    
$replace = array("'",
                     
"'",
                     
'"',
                     
'"',
                     
'-');

    return 
str_replace($search$replace$string);
}

?>

(This function also converts an emdash into a hyphen.)

If you want "smart quotes" to actually appear in a browser, you can use the following $replace array to convert each of these characters into HTML entities:

<?php 

$replace 
= array('&lsquo;',
                 
'&rsquo;',
                 
'&ldquo;',
                 
'&rdquo;',
                 
'&mdash;');

?>

These entities render properly in most browsers I've tried, including lynx. Here's some example HTML:

&lsquo;single quotes&rsquo; 
&ldquo;double quotes&rdquo;
em&mdash;dash

Here is how these entities render in your browser:

‘single quotes’
“double quotes”
em—dash
The htmlentities() man page has other useful examples in the user notes at the bottom, some of which convert a wide variety of characters into valid HTML entities.

About This Post

Convert Smart Quotes with PHP was posted on Mon, 31 Oct 2005 at 21:42:47 GMT.

19 Comments

1. Krijn Hoetmer's GravatarKrijn Hoetmer said:

If you want to use 'raw utf-8' (which Lynx can handle as well) you can use http://ktk.xs4all.nl/stuff/php/converting-numeric-character-references/ to convert those numeric character references.

Tue, 01 Nov 2005 at 10:03:53 GMT Link


2. Ben Ramsey's GravatarBen Ramsey said:

You can also use the following character entity references to achieve the same:

&lsquo; - left single quote

&rsquo; - right single quote

&ldquo; - left double quote

&rdquo; - right double quote

&mdash; - em dash

Though, I'm not sure how many clients support these. You can find more here: http://www.htmlhelp.com/reference/html40/entities/special.html

Tue, 01 Nov 2005 at 14:47:45 GMT Link


3. John Wilkins's GravatarJohn Wilkins said:

Chris, the numbered HTML entities that you use (145-151) are not valid Extended ASCII codes; they are Windows Extended ASCII.

And on non-Windows boxes, it depends on which font is used on whether you will see the proper character or a gibberish character (question marks, lower-case i with an accent, and square boxes are common.)

If you want curly quotes, em and en dashes, ellipsis, etc., make sure you use the HTML entities Ben Ramsey pointed out: http://www.htmlhelp.com/reference/html40/entities/special.html

Here's the convert arrays that I use in my code (note it converts Windows Extended ASCII codes into standard codes:

/**
 *  ‘  8216  curly left single quote
 *  ’  8217  apostrophe, curly right single quote
 *  “  8220  curly left double quote
 *  ”  8221  curly right double quote
 *  —  8212  em dash
 *  –  8211  en dash
 *  …  8230  ellipsis
 */
$search = array(
                '&',
                '<',
                '>',
                '"',
                chr(212),
                chr(213),
                chr(210),
                chr(211),
                chr(209),
                chr(208),
                chr(201),
                chr(145),
                chr(146),
                chr(147),
                chr(148),
                chr(151),
                chr(150),
                chr(133)
                );
$replace = array(
                '&amp;',
                '&lt;',
                '&gt;',
                '&quot;',
                '&#8216;',
                '&#8217;',
                '&#8220;',
                '&#8221;',
                '&#8211;',
                '&#8212;',
                '&#8230;',
                '&#8216;',
                '&#8217;',
                '&#8220;',
                '&#8221;',
                '&#8211;',
                '&#8212;',
                '&#8230;'
                );

Tue, 01 Nov 2005 at 16:54:40 GMT Link


4. András Bártházi's GravatarAndrás Bártházi said:

And if we're talking about quotes, don't forget to mention the HTML element "<q>", that is for proper quoting. Unfortunetly, not supported by all browsers. :(

Thu, 03 Nov 2005 at 07:06:20 GMT Link


5. Nev's GravatarNev said:

Hi @ll

Why you don't use HTMLENTITIES with parameter??

htmlentities ( string string [, int quote_style [, string charset]] )

like this:

IF (0 < SIZEOF($_POST)) {

FOREACH ($_POST AS $key => $val) {

IF (TRUE == IS_ARRAY($val)) {

FOREACH ($val AS $a_key => $a_val) {

IF (TRUE == IS_ARRAY($a_val)) {

FOREACH ($a_val AS $a_a_key => $a_a_val) {

$_POST[$key][$a_key][$a_a_key] = HTMLENTITIES(STRIPSLASHES(TRIM($a_a_val)), ENT_QUOTES, 'ISO-8859-1');

}

} ELSE {

$_POST[$key][$a_key] = HTMLENTITIES(STRIPSLASHES(TRIM($a_val)), ENT_QUOTES, 'ISO-8859-1');

}

}

} ELSE {

$_POST[$key] = HTMLENTITIES(STRIPSLASHES(TRIM($val)), ENT_QUOTES, 'ISO-8859-1');

}

}

}

Sat, 05 Nov 2005 at 10:31:38 GMT Link


6. Chris Shiflett's GravatarChris Shiflett said:

Thanks Ben and John. I've updated the entry.

Sun, 06 Nov 2005 at 04:46:02 GMT Link


7. Douglas Clifton's GravatarDouglas Clifton said:

Also, for XHTML markup, you should avoid using the named character entities. Use either the decimal or hex (UTF) equivalents. I tend to use many of these mixed in with normal markup, and have found that a MySQL look-up table mapped to a PHP array works really well for this sort of thing.

Have a look at:

http://loadaveragezero.com/app/dbro...es/unicode/data

for more information on this technique. ~d

Tue, 08 Nov 2005 at 20:18:48 GMT Link


8. Douglas Clifton's GravatarDouglas Clifton said:

I also recommend taking a look at the PHP port of Smartypants:

http://www.michelf.com/projects/php-smartypants/

Tue, 08 Nov 2005 at 20:22:26 GMT Link


9. Don Laur's GravatarDon Laur said:

It worked great for me, very useful for moving from Word to the web.

Thanks!

Wed, 16 Nov 2005 at 20:08:12 GMT Link


10. marcus's Gravatarmarcus said:

I realized I'm about 7 months late! :)

I got hacked off with wordpress, etc trying to "interpret" my HTML, etc - so wrote a tool that uses Perl to convert characters to ASCII (albeit probably not extended).

Was n't till after I worked out how to turn off the rich text writer under wordpress. ;)

It works wonders for me ... although might not suit every browser, situation, etc. Only thing that breaks it, that I've found is ampersands on firefox.

Sun, 18 Jun 2006 at 17:15:44 GMT Link


11. Richard Lynch's GravatarRichard Lynch said:

There's a buttload of other solutions on php.net in str_replace (link above) and other places.

All kinds of "fun" non-standard Windows byte-codes are documented there that you have to convert to something useful for the 'net, cuz your users are sooo windows-centric they think everybody sees what they see.

Mon, 02 Oct 2006 at 22:19:32 GMT Link


12. Jonas's GravatarJonas said:

"<q>" ist the proper tag for quoting in HTML, isn't it?

Fri, 01 Dec 2006 at 08:21:04 GMT Link


13. Mark's GravatarMark said:

After doing a hex dump of some incoming data that contained smart quotes, I noticed 3-byte sequences that represented the UTF-8 quotes. Here's a modified version of your function for dealing with UTF-8 smart quotes:

<?php
 
function convert_raw_utf8_smart_quotes($string)
{
  $search = array(chr(0xe2) . chr(0x80) . chr(0x98),
                  chr(0xe2) . chr(0x80) . chr(0x99),
                  chr(0xe2) . chr(0x80) . chr(0x9c),
                  chr(0xe2) . chr(0x80) . chr(0x9d),
                  chr(0xe2) . chr(0x80) . chr(0x93),
                  chr(0xe2) . chr(0x80) . chr(0x94));
 
  $replace = array('&lsquo;',
                   '&rsquo;',
                   '&ldquo;',
                   '&rdquo;',
                   '&ndash;',
                   '&mdash;');
                   
  return str_replace($search, $replace, $string);
}
 
?>

I found a good ASCII table here: http://www.manderby.com/mandalex/a/ascii.php

Wed, 16 May 2007 at 19:40:56 GMT Link


14. John's GravatarJohn said:

Thanks Mark, that works great! I added one more character, for the ellipsis (...), which has also given me some problems.

I added to the array:

chr(0xe2) . chr(0x80) . chr(0xa6)

Add put as replacement: '...'

Mon, 09 Jul 2007 at 01:39:34 GMT Link


15. Dan's GravatarDan said:

Just wanted to say thanks -- this entry was VERY helpful to me. I ended up using Chris' script with a few entries from John's comment.

Cheers!

Wed, 28 Nov 2007 at 07:39:11 GMT Link


16. Adam Bergstein's GravatarAdam Bergstein said:

I know this is very basic, but it helped me out. Use at your own risk...

<?php
 
function fixcurlys($str){
    $replacement = str_replace('%93', '"', urlencode($str));
    return urldecode(str_replace('%94', '"', $replacement));
}
 
?>

Mon, 07 Jan 2008 at 19:26:07 GMT Link


17. Bob's GravatarBob said:

Chris: I spent numerous hours over the past few weeks, trying to find a way to identify and replace smart quotes!

Your first reference code above solved my problem immediately!

Thank you!

Mon, 21 Apr 2008 at 03:02:53 GMT Link


18. Nathan's GravatarNathan said:

thank god. this was the bane of my existence. appreciate the help!

Sat, 26 Apr 2008 at 08:25:35 GMT Link


19. Zac's GravatarZac said:

Awesome code! Thanks!

Sun, 11 May 2008 at 20:49:55 GMT Link


Post A Comment

Personal Details and Comment

Style Guide

Line breaks are converted to paragraphs. Also use:

  • <a href="" title="">text</a>1
  • <em>text</em>
  • <blockquote><p>text</p></blockquote>
  • <code>2  <?php  if ($foo) {      $foo = TRUE;  }  ?></code>
  1. Note: <code> can be used inline (e.g. in paragraphs) or in a block as shown. Include whitespace and newlines in blocks.

Please enter Chris (my first name) below. This is a primitive spam prevention technique, and I apologize for the inconvenience.

Preview and Submit

Upcoming Talks

PHP Appalachia

11 - 14 Oct 2008

At Big Bear Lodge, Gatlinburg, Tennessee.

php|works / PyWorks

12 - 14 Nov 2008

At Sheraton Gateway Hotel Atlanta Airport, Atlanta, Georgia.

New Comments

Chris Shiflett wrote:

Miguel, read the post again. PHP 4.4.9 is the final release of PHP 4.

Posted in End of Life for PHP 4
Miguel Palazzo wrote:

I think you're wrong. PHP 4.4 is DEAD, that's so right, because they just released 4.4.9, and you...

Posted in End of Life for PHP 4
alikim wrote:

Hi, Thanks for the article! Tell me please if it's enough to use just session_start(); se...

Posted in
Wayne wrote:

Hi ZX, When taking in data, you should always check to see if magic_quotes is enabled. If it i...

Posted in addslashes() Versus mysql_real_escape_string()
Chris Shiflett wrote:

Thanks, Brandon. I'm glad you liked the talk. Maybe some parts of it would be interesting to some...

Posted in ZendCon

Browse Comments