About the Author

Chris Shiflett

Hi, I’m Chris: entrepreneur, community leader, husband, and father. I live and work in Boulder, CO.


Convert Smart Quotes with PHP

A question that seems to come up pretty frequently on various PHP mailing lists is how to convert "smart quotes" to real quotes. Dan Convissor provided a simple example that performs such a conversion a year or two ago on the NYPHP mailing list. I've modified it slightly to fit my own style preferences:

<?php 

function convert_smart_quotes($string)
{
    
$search = array(chr(145),
                    
chr(146),
                    
chr(147),
                    
chr(148),
                    
chr(151));

    
$replace = array("'",
                     
"'",
                     
'"',
                     
'"',
                     
'-');

    return 
str_replace($search$replace$string);
}

?>

(This function also converts an emdash into a hyphen.)

If you want "smart quotes" to actually appear in a browser, you can use the following $replace array to convert each of these characters into HTML entities:

<?php 

$replace 
= array('&lsquo;',
                 
'&rsquo;',
                 
'&ldquo;',
                 
'&rdquo;',
                 
'&mdash;');

?>

These entities render properly in most browsers I've tried, including lynx. Here's some example HTML:

&lsquo;single quotes&rsquo; 
&ldquo;double quotes&rdquo;
em&mdash;dash

Here is how these entities render in your browser:

‘single quotes’
“double quotes”
em—dash
The htmlentities() man page has other useful examples in the user notes at the bottom, some of which convert a wide variety of characters into valid HTML entities.

About this post

Convert Smart Quotes with PHP was posted on Mon, 31 Oct 2005. If you liked it, follow me on Twitter or share:

25 comments

1.Krijn Hoetmer said:

If you want to use 'raw utf-8' (which Lynx can handle as well) you can use http://ktk.xs4all.nl/stuff/php/converting-numeric-character-references/ to convert those numeric character references.

Tue, 01 Nov 2005 at 10:03:53 GMT Link


2.Ben Ramsey said:

You can also use the following character entity references to achieve the same:

&lsquo; - left single quote

&rsquo; - right single quote

&ldquo; - left double quote

&rdquo; - right double quote

&mdash; - em dash

Though, I'm not sure how many clients support these. You can find more here: http://www.htmlhelp.com/reference/html40/entities/special.html

Tue, 01 Nov 2005 at 14:47:45 GMT Link


3.John Wilkins said:

Chris, the numbered HTML entities that you use (145-151) are not valid Extended ASCII codes; they are Windows Extended ASCII.

And on non-Windows boxes, it depends on which font is used on whether you will see the proper character or a gibberish character (question marks, lower-case i with an accent, and square boxes are common.)

If you want curly quotes, em and en dashes, ellipsis, etc., make sure you use the HTML entities Ben Ramsey pointed out: http://www.htmlhelp.com/reference/html40/entities/special.html

Here's the convert arrays that I use in my code (note it converts Windows Extended ASCII codes into standard codes:

/**
 
 *  ‘  8216  curly left single quote
 
 *  ’  8217  apostrophe, curly right single quote
 
 *  “  8220  curly left double quote
 
 *  ”  8221  curly right double quote
 
 *  —  8212  em dash
 
 *  –  8211  en dash
 
 *  …  8230  ellipsis
 
 */
 
$search = array(
 
                '&',
 
                '<',
 
                '>',
 
                '"',
 
                chr(212),
 
                chr(213),
 
                chr(210),
 
                chr(211),
 
                chr(209),
 
                chr(208),
 
                chr(201),
                chr(145),
 
                chr(146),
 
                chr(147),
 
                chr(148),
 
                chr(151),
 
                chr(150),
 
                chr(133)
 
                );
 
$replace = array(
 
                '&amp;',
 
                '&lt;',
 
                '&gt;',
 
                '&quot;',
 
                '&#8216;',
 
                '&#8217;',
 
                '&#8220;',
 
                '&#8221;',
 
                '&#8211;',
 
                '&#8212;',
 
                '&#8230;',
                '&#8216;',
 
                '&#8217;',
 
                '&#8220;',
 
                '&#8221;',
 
                '&#8211;',
 
                '&#8212;',
 
                '&#8230;'
 
                );

Tue, 01 Nov 2005 at 16:54:40 GMT Link


4.András Bártházi said:

And if we're talking about quotes, don't forget to mention the HTML element "<q>", that is for proper quoting. Unfortunetly, not supported by all browsers. :(

Thu, 03 Nov 2005 at 07:06:20 GMT Link


5.Nev said:

Hi @ll

Why you don't use HTMLENTITIES with parameter??

htmlentities ( string string [, int quote_style [, string charset]] )

like this:

IF (0 < SIZEOF($_POST)) {

FOREACH ($_POST AS $key => $val) {

IF (TRUE == IS_ARRAY($val)) {

FOREACH ($val AS $a_key => $a_val) {

IF (TRUE == IS_ARRAY($a_val)) {

FOREACH ($a_val AS $a_a_key => $a_a_val) {

$_POST[$key][$a_key][$a_a_key] = HTMLENTITIES(STRIPSLASHES(TRIM($a_a_val)), ENT_QUOTES, 'ISO-8859-1');

}

} ELSE {

$_POST[$key][$a_key] = HTMLENTITIES(STRIPSLASHES(TRIM($a_val)), ENT_QUOTES, 'ISO-8859-1');

}

}

} ELSE {

$_POST[$key] = HTMLENTITIES(STRIPSLASHES(TRIM($val)), ENT_QUOTES, 'ISO-8859-1');

}

}

}

Sat, 05 Nov 2005 at 10:31:38 GMT Link


6.Chris Shiflett said:

Thanks Ben and John. I've updated the entry.

Sun, 06 Nov 2005 at 04:46:02 GMT Link


7.Douglas Clifton said:

Also, for XHTML markup, you should avoid using the named character entities. Use either the decimal or hex (UTF) equivalents. I tend to use many of these mixed in with normal markup, and have found that a MySQL look-up table mapped to a PHP array works really well for this sort of thing.

Have a look at:

http://loadaveragezero.com/app/dbro...es/unicode/data

for more information on this technique. ~d

Tue, 08 Nov 2005 at 20:18:48 GMT Link


8.Douglas Clifton said:

I also recommend taking a look at the PHP port of Smartypants:

http://www.michelf.com/projects/php-smartypants/

Tue, 08 Nov 2005 at 20:22:26 GMT Link


9.Don Laur said:

It worked great for me, very useful for moving from Word to the web.

Thanks!

Wed, 16 Nov 2005 at 20:08:12 GMT Link


10.marcus said:

I realized I'm about 7 months late! :)

I got hacked off with wordpress, etc trying to "interpret" my HTML, etc - so wrote a tool that uses Perl to convert characters to ASCII (albeit probably not extended).

Was n't till after I worked out how to turn off the rich text writer under wordpress. ;)

It works wonders for me ... although might not suit every browser, situation, etc. Only thing that breaks it, that I've found is ampersands on firefox.

Sun, 18 Jun 2006 at 17:15:44 GMT Link


11.Richard Lynch said:

There's a buttload of other solutions on php.net in str_replace (link above) and other places.

All kinds of "fun" non-standard Windows byte-codes are documented there that you have to convert to something useful for the 'net, cuz your users are sooo windows-centric they think everybody sees what they see.

Mon, 02 Oct 2006 at 22:19:32 GMT Link


12.Jonas said:

"<q>" ist the proper tag for quoting in HTML, isn't it?

Fri, 01 Dec 2006 at 08:21:04 GMT Link


13.Mark said:

After doing a hex dump of some incoming data that contained smart quotes, I noticed 3-byte sequences that represented the UTF-8 quotes. Here's a modified version of your function for dealing with UTF-8 smart quotes:

<?php
 
function convert_raw_utf8_smart_quotes($string)
 
{
 
  $search = array(chr(0xe2) . chr(0x80) . chr(0x98),
 
                  chr(0xe2) . chr(0x80) . chr(0x99),
 
                  chr(0xe2) . chr(0x80) . chr(0x9c),
 
                  chr(0xe2) . chr(0x80) . chr(0x9d),
 
                  chr(0xe2) . chr(0x80) . chr(0x93),
 
                  chr(0xe2) . chr(0x80) . chr(0x94));
 
 
 
  $replace = array('&lsquo;',
 
                   '&rsquo;',
 
                   '&ldquo;',
 
                   '&rdquo;',
 
                   '&ndash;',
 
                   '&mdash;');
 
                   
 
  return str_replace($search, $replace, $string);
 
}
 
?>

I found a good ASCII table here: http://www.manderby.com/mandalex/a/ascii.php

Wed, 16 May 2007 at 19:40:56 GMT Link


14.John said:

Thanks Mark, that works great! I added one more character, for the ellipsis (...), which has also given me some problems.

I added to the array:

chr(0xe2) . chr(0x80) . chr(0xa6)

Add put as replacement: '...'

Mon, 09 Jul 2007 at 01:39:34 GMT Link


15.Dan said:

Just wanted to say thanks -- this entry was VERY helpful to me. I ended up using Chris' script with a few entries from John's comment.

Cheers!

Wed, 28 Nov 2007 at 07:39:11 GMT Link


16.Adam Bergstein said:

I know this is very basic, but it helped me out. Use at your own risk...

<?php
 
function fixcurlys($str){
 
    $replacement = str_replace('%93', '"', urlencode($str));
 
    return urldecode(str_replace('%94', '"', $replacement));
 
}
 
?>

Mon, 07 Jan 2008 at 19:26:07 GMT Link


17.Bob said:

Chris: I spent numerous hours over the past few weeks, trying to find a way to identify and replace smart quotes!

Your first reference code above solved my problem immediately!

Thank you!

Mon, 21 Apr 2008 at 03:02:53 GMT Link


18.Nathan said:

thank god. this was the bane of my existence. appreciate the help!

Sat, 26 Apr 2008 at 08:25:35 GMT Link


19.Zac said:

Awesome code! Thanks!

Sun, 11 May 2008 at 20:49:55 GMT Link


20.Dave said:

I know this is an old post, just wanted to say I ended up needing to use iconv to solve my encoding issue first, then convert the fancy punctuation to basic with the above functions afterwards.

This is a common situation if you're importing content that was authored using a Microsoft Office tool.

$text = iconv("Windows-1252","UTF-8//IGNORE",$text);

// or something like this for multibyte:

//$text = mb_convert_encoding($text, 'UTF-8', 'Windows-1252');

// now convert to standard ascii with convert_smart_quotes function.

Fri, 19 Dec 2008 at 17:52:00 GMT Link


21.Sandy said:

I think ensuring you have consistent encoding will solve most entity issues. For example, UTF-8 on the input page, UTF-8 on the output page and UTF-8 on the database tables.

Thu, 16 Apr 2009 at 07:00:03 GMT Link


22.atpaz said:

Hi Chris, is this work for chinese character input ? anyone ? Thanks..

Thu, 30 Jul 2009 at 08:28:23 GMT Link


23.hayden said:

Apostrophes and quotations showing up as diamond with question mark inside, why?!

its the curly apostrophe from ms word, heres a php fix:

$trimmed = strtr($caption, "’", "'");

echo $trimmed;

Mon, 07 Dec 2009 at 23:39:00 GMT Link


24.Jon said:

Chris' code wasn't working for me, due to UTF-8 issues I think, but I found the following on the NYPHP list which works perfectly for me. It's similar to what Mark wrote above, but simpler.

http://lists.nyphp.org/pipermail/ta...ber/026947.html

/**
 
  * Remove unwanted MS Word high characters from a string
 
  *
 
  * @param string $string
 
  * @return string $string
 
  */
 
function sanitizeString($string = null)
 
{
 
    if(is_null($string)) return false;
 
    
 
    //-> Replace all of those weird MS Word quotes and other high  
 
characters
 
     $badwordchars=array(
 
         "\xe2\x80\x98", // left single quote
 
         "\xe2\x80\x99", // right single quote
 
         "\xe2\x80\x9c", // left double quote
 
         "\xe2\x80\x9d", // right double quote
 
         "\xe2\x80\x94", // em dash
 
         "\xe2\x80\xa6" // elipses
 
     );
 
     $fixedwordchars=array(
 
         "'",
 
         "'",
 
         '"',
 
         '"',
 
         '&mdash;',
 
         '...'
 
     );
 
     return htmlspecialchars(str_replace($badwordchars,$fixedwordchars, 
 
$string));
 
}

Sun, 27 Jun 2010 at 13:26:47 GMT Link


25.Cory Becker said:

chr(0xe2) . chr(0x84) . chr(0xa2) for &trade;

Wed, 15 Jun 2011 at 00:09:51 GMT Link


Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.