About the Author

Chris Shiflett

Hi, I’m Chris: entrepreneur, community leader, husband, and father. I live and work in Boulder, CO.


Formatting and Highlighting PHP Code Listings

For the impatient, here's a direct link to the example that highlights itself:

http://shiflett.org/code/highlight.php

As I mentioned in the previous post, shiflett.org is being redesigned and redeveloped from the ground up. (Nope, it's not finished yet; you'll know it when you see it.) One of the things I want to improve is commenting. This blog has been getting a lot of comments, and I really appreciate that. (Thanks!) Since the topics I talk about (PHP, MySQL, etc.) are technical, I want to let you add formatted code listings to your comments.

I've been playing with this tonight. Feel free to follow along as I go. The first thing you want to do is create an ordered list from the code you want to format ($code in these examples). This provides line numbers, among other things:

<?php
 
/* HTML Output */
$html = array();
 
/* Normalize Newlines */
$code = str_replace("\r", "\n", $code);
$code = preg_replace("!\n\n\n+!", "\n\n", $code);
 
$lines = explode("\n", $code);
 
/* Output Listing */
echo "<ol class=\"code\">\n";
foreach ($lines as $line) {
    if (empty($line)) {
        $html['line'] = '&#160;';
    } else {
        $html['line'] = htmlentities($line, ENT_QUOTES, 'UTF-8');
    }
 
    echo "  <li><code>{$html['line']}</code></li>\n";
}
echo "</ol>\n";
 
?>

In order to make <code> tags preserve whitespace, you can add this to your CSS:

code {
    white-space: pre;
}

Pretty easy, right? Now that you have a good foundation, you can start to improve it. First, add class="even" to every other list item:

<?php
 
foreach ($lines as $key => $line) {
    if (empty($line)) {
        $line = '&#160;';
    }
 
    $html['line'] = htmlentities($line, ENT_QUOTES, 'UTF-8');
 
    if ($key % 2) {
        echo "    <li class=\"even\"><code>{$html['line']}</code></li>\n";
    } else {
        echo "    <li><code>{$html['line']}</code></li>\n";
    }
}
 
?>

This lets you add a subtle background color to the even rows, making the code easier to read:

ol.code li.even {
    background:#f3f3f0;
}

The next step is to add syntax highlighting. This is a bit more involved, but only if you're picky. (I am.) You can use token_get_all() and loop through the tokens yourself, or you can use highlight_string() and try to clean up its output. I have chosen the latter.

You avoid some of the cleanup by using this idea I got from Wez:

<?php
 
ini_set('highlight.comment', 'comment');
ini_set('highlight.default', 'default');
ini_set('highlight.keyword', 'keyword');
ini_set('highlight.string', 'string');
ini_set('highlight.html', 'html');
 
$code = highlight_string($code, TRUE);
 
?>

This gets rid of colors and uses meaningful names instead, but it leaves behind plenty of ugliness. If you're like me, the first thing you want to do is get rid of the extra crap that highlight_string() adds to the front and end of the string:

<?php
 
$code = substr($code, 33, -15);
 
?>

If you're using PHP 4, this is going to be different. You can do something more clever to accommodate both. I didn't.

A simple replacement can turn inline styles into classes:

<?php
 
$code = str_replace('<span style="color: ', '<span class="', $code);
 
?>

If you're using PHP 4, you're going to need to do this for <font> tags instead, but it's the same basic idea.

Might as well turn &nbsp; back into a space, &amp; into &#38;, and <br /> back into a newline while you're at it:

<?php
 
$code = str_replace('&nbsp;', ' ', $code);
$code = str_replace('&amp;', '&#38;', $code);
$code = str_replace('<br />', "\n", $code);
 
?>

Now you can put the pieces together, but there's one more obstacle to overcome. The highlight_string() function closes a <span> tag just before opening the next one, sometimes several lines later. This can yield output that looks like this:

<li><code><span class="comment">...</code></li>
<li><code>...</code></li>
<li><code>...</span></code></li>

You want it to look more balanced, like this:

<li><code><span class="comment">...</span></code></li>
<li><code><span class="comment">...</span></code></li>
<li><code><span class="comment">...</span></code></li>

Feel free to solve this one on your own. (Solving this almost made me wish I had used token_get_all() instead of highlight_string().) If you're interested in seeing my solution, I've got an example that highlights itself, complete with a document type, styles, and everything else needed to make it validate as XHTML 1.0 Strict. (View source if you want to really appreciate the XHTML goodness.)

Thanks to Jon Tan for the styles and colors. He's the accessibility, usability, standards, and design expert that's helping with the new site.

I'll probably be making some minor improvements to this code before using it in production on the new site. If you notice any bugs or can think of any improvements, please leave a comment. Thanks!

About this post

Formatting and Highlighting PHP Code Listings was posted on Thu, 26 Oct 2006. If you liked it, follow me on Twitter or share:

36 comments

1.Pierre said:

The only problem of highlight_string is its limitation to php :-)

Take a look at Geshi, it works very well and for many languages or file types:

http://blog.thepimp.net/index.php/2...ter-php-version

It is easy to integrate in an existing site, everything can be changed (css or manually).

Thu, 26 Oct 2006 at 23:07:41 GMT Link


2.Chris Shiflett said:

Thanks, Pierre. I may have to give GeSHi another look.

For highlighting PHP, I'm pretty happy with my results. With barely more than 50 lines of code, I'm able to get the exact XHTML I want, and with Jon's help, it looks great. :-)

Thu, 26 Oct 2006 at 23:17:42 GMT Link


3.Ilia Alshanetsky said:

Isn't

echo "<ol class="code">\n"; and this as well echo " <li class="even">{$html['line']}</li>\n"

a parse error? :)

Also, instead of

while (strpos($code, "\n\n\n") !== FALSE) {

$code = str_replace("\n\n\n", "\n\n", $code);

}

could you not just do:

$code = preg_replace("!\n\n\n+!", "\n\n", $code); ?

Fri, 27 Oct 2006 at 00:36:55 GMT Link


4.Chris Shiflett said:

Thanks, Ilia. I lost my backslashes in the process of copying and pasting code, and I thought I had gone back and added them back, but clearly not.

I'm using preg_replace() now, per your suggestion. That saves me another line of code. Thanks. :-)

Fri, 27 Oct 2006 at 03:31:48 GMT Link


5.Matthijs said:

Looks good Chris, thanks for the write-up. Only drawback I see in the system is when you use line-numbering it makes copy-pasting code difficult, as the numbers are copied as well, at least in Firefox (I saw that Safari doesn't copy the numbers, so maybe it's only FF). But then again for talking about the code the line numbers are useful.

Fri, 27 Oct 2006 at 06:43:09 GMT Link


6.johno said:

Try this one. http://hvge.sk/scripts/fshl/

Fri, 27 Oct 2006 at 09:05:30 GMT Link


7.Edward said:

Have a look at:

http://www.dreamprojections.com/syntaxhighlighter/

Yahoo use it on their developer network site (e.g. http://developer.yahoo.com/yui/container/overlay/index.html). I really like it and it solves the problem of copying with line numbers, as Matthijs points out. (It even has a button for copying the text to the clipboard for you!)

Fri, 27 Oct 2006 at 11:11:43 GMT Link


8.Readster said:

It looks very nice, but it is not good for copay and pase, because of the line numbers. So here is my contribution to remove (toogle) the line numbers

Give <ol> an ID

<ol class="code" id="code">

<style type="text/css" media="screen">

.codeNoLine li{

list-style:none;

}

</style>

<script language="JavaScript" type="text/javascript">

function toggleLineNumber(){

c = document.getElementById('code');

if( c.hasNumber ){

c.className = 'code';

c.hasNumber = false;

}else{

c.className = c.className+' '+'codeNoLine';

c.hasNumber = true;

}

}

</script>

</script>

<a href="javascript:toggleLineNumber()">Remove line numbers</a>

Readster from Germany

Fri, 27 Oct 2006 at 12:46:38 GMT Link


9.Chris Shiflett said:

Matthijs, you're right about the copy/paste problem.

I use Firefox, and I've always found that behavior in Firefox annoying, even before playing with this. You can't select the line numbers, and it's counterintuitive to copy something that can't be selected (and isn't highlighted as such). Plus, it uses # instead of actual numbers. I can't imagine that's what most people expect or want to happen.

Safari and Opera get it right. I'm not sure what IE does.

I'd love to find a transparent solution to this before using it on the new site, but if I have to choose between having line numbers and not having extra # characters appear in pasted code for Firefox users, I'll choose line numbers. Code listings here tend to be pretty small and more for discussion purposes than anything else.

Fri, 27 Oct 2006 at 14:32:16 GMT Link


10.Chris Shiflett said:

Thanks, Edward. I may have to check that out as a possible solution.

Readster, thanks for your contribution. I tried this, and it does toggle the line numbers. Unfortunately, in Firefox, even without the line numbers, I get # when I copy and paste.

There's another problem that isn't solved in my current example, and that's overflow. Sometimes people have really long lines of code, and my current solution is an ugly scroll bar.

Jon is working on a better-looking solution, and the end result will probably be a toggle link of some sort to expand the listing. Perhaps another toggle link for removing the line numbers is the best solution for this particular problem.

Fri, 27 Oct 2006 at 14:36:46 GMT Link


11.Douglas Clifton said:

Good stuff. I especially like the feature above to toggle removal of line numbers so you can c-n-p. Another simple idea is to add ids to each <li> so you can use fragment identifiers in links to the code to jump you straight to a particular location in the source.

Fri, 27 Oct 2006 at 17:45:56 GMT Link


12.Chris Shiflett said:

That's a clever idea, Douglas. I'll see what I can do. :-)

Fri, 27 Oct 2006 at 17:49:31 GMT Link


13.Douglas Clifton said:

Although in this example the markup isn't dynamically generated, and doesn't use a list, it does illustrate the concept.

http://loadaveragezero.com/vnav/lab...P/srcDTD.php#18

Notice that the the line numbers themselves are navigation aids. In other words, clicking on one scrolls the page to that line number.

I generated the page directly from the working source, but it didn't make any sense to tax the server by doing it over and over, so I just spit the results out to a static page.

Fri, 27 Oct 2006 at 22:06:02 GMT Link


14.balluche said:

It's possible for the price of a little ressource consumption to <b>"reg-replace" some code</b> to make highlight() function <b>works with any language</b>, not just PHP. The idea is to remove <?php ... ?> characters after the hilight(). Notice that the function need <?php ... ?> to work properly. I worked on this stuff some time ago. And here's the result :

if (strpos($code_to_colour, '<?php') === FALSE)
 
      {
 
          //if code doesn't contains <?php, we add it to make highlight work
 
      $lecode = highlight_string("<?php ".$code_to_colour." ?>",true);
 
      //we remove not usefull characters
 
      $lecode = preg_replace('@<code><font color="#000000">.+<font color="#0000bb">&lt;\?php&nbsp;@si', '<font color="#000000">', $lecode);
 
      $lecode = preg_replace('@<code><font color="#000000">.+<font color="#007700">&lt;\?</font><font color="#0000bb">php&nbsp;@si', '<font color="#000000">', $lecode);
 
      $lecode = str_replace("<font color=\"#0000BB\">?&gt;</font>\n</font>\n</code>", "", $lecode);
 
       $lecode = str_replace("&nbsp;?&gt;</font>", "</font>", $lecode);
 
      //the &nbsp; impeech the result to adapt to viewer. I leave only those at beginning of the line
 
      $lecode = str_replace('&nbsp;&nbsp;', '@nbsp;@nbsp;', $lecode);
 
      $lecode = str_replace('&nbsp;', ' ', $lecode);
 
      $lecode = str_replace('@nbsp;@nbsp;', '&nbsp;&nbsp;', $lecode);
 
       }
 
       else
 
       {
 
          //we remove the tags highlight added
 
          $lecode = highlight_string($code_to_colour,true);
 
          $lecode = preg_replace("/(<code>|<\/code>)/","",$lecode);
 
       } 
 
      echo $lecode;

It's not obvious make it to work with all PHP versions because the output of highlight() function changed many times. It needs to be adapted for PHP5. The code isn't "optimised" as well but here's the basic idea.

Check this out lately for PHP5 adaptation : http://balluche.free.fr/?618/Beautifier-le-code-source

(french sorry)

Sat, 28 Oct 2006 at 14:18:17 GMT Link


15.Chris Shiflett said:

Hi Balluche,

Thanks very much for sharing!

Sat, 28 Oct 2006 at 14:42:32 GMT Link


16.Krijn Hoetmer said:

For a version which uses token_get_all(), take a look at http://krijnhoetmer.nl/stuff/php/php-highlighter/.

Sat, 04 Nov 2006 at 13:14:31 GMT Link


17.Krijn Hoetmer said:

Note to self; read comment guidelines :)

Sat, 04 Nov 2006 at 13:14:59 GMT Link


18.Chris Shiflett said:

Thanks, Krijn!

Sat, 04 Nov 2006 at 18:52:36 GMT Link


19.David said:

This is something I have been working on for a while so I am glad to find a site like yours talking about it.

Now, after downloading your example I see that the there is a problem with your code. (though it might not be to some people).

You highlighter wraps everything in code not just PHP/XHTML code.

The same way php.nethandles it in the manual.

So how would you stop the highlighter from going beyond "< ?php" or "< code >" tags?

Otherwise you might as well clean the data then just run

<?php
 
function highlight_php($code) {
 
    $code = '<div class="php">'. highlight_string($code, true). '</div>';
 
    return $code;
 
}
 
?>

Also, it seems that even doing this will allow attacks so why not just go with BBcode since something like http://htmlpurifier.org/ is over 350kbs? - and yet other than removing all HTML code it is the only thing that works?

Fri, 25 May 2007 at 18:39:36 GMT Link


20.Chris Shiflett said:

Hi David,

I see that the there is a problem with your code. (though it might not be to some people). Your highlighter wraps everything in code not just PHP/XHTML code.

As you'll note with your own comment, not everything is enclosed in <code> tags. Only the code is.

So how would you stop the highlighter from going beyond <?php or <code> tags?

More information is available at the following URL:

http://shiflett.org/blog/2007/mar/a...-preventing-xss

Regarding your last statement, BBCode does nothing to improve security, and Paul's comment is about strip_tags().

Hope that helps.

Fri, 25 May 2007 at 18:57:16 GMT Link


21.David said:

As you'll note with your own comment, not everything is enclosed in <code> tags. Only the code is.

hmm.... well you the code in your example must be missing something. Because while the code that process your comments can tell code from text - the highlighter can't - or at least I am missing something. I have tried a couple things and I can't get it to stop highlighting after (or before) the code.

<blockquote><p>More information is available at the following URL:</p></blockquote>

I read the whole page and all the comments but I don't see anywhere where it talks about stopping the highlighter from highlighting everything..?

Regarding your last statement, BBCode does nothing to improve security, and Paul's comment is about strip_tags().

But since browsers ignore bbcode - it seems like the most secure way to process input is if you striped ALL html code from the text (thereby avoiding what preinheimer was talking about) and then used something like phpBB's bbcode processor to accomplish the same thing that the 350k htmlpurifier does.

Of course this is assuming that all of the highlighting code examples I have seen on this site are incomplete (at least for a newbe like me) and we are still looking for a 99% secure way to process code - right? or am I missing somthing?

Also, what about adding to your code using some kind of preg_replace() to clean out "style="color:#000;"", or "onClick="dothis()"", or "a href="javascript:alert('XSS')"" - would that work?

Fri, 25 May 2007 at 19:39:54 GMT Link


22.Chris Shiflett said:

Hi David,

Interestingly enough, I think you managed to reveal a bug in my code with your second use of the <blockquote> tag. I'll have to look into that.

Regarding how to distinguish code, I employ a style guide. Just as you did in your first comment, people who wish to include code in their comments use the <code> tag to do so. I just read my other post again, and I see that I don't explain this very well. Assuming you have the code highlighting method defined in a class, you can use a regular expression for the replacement:

<?php
 
$html = preg_replace('!^&lt;code&gt;((.|\n)*)&lt;\/code&gt;$!meU',
 
                     '$this->code(\'$1\', TRUE)',
 
                     $html);
 
?>

Hope that's a bit clearer.

But since browsers ignore bbcode - it seems like the most secure way to process input is if you striped ALL html code from the text

If someone takes the time to comment on my blog, I think it would be rude for me to remove part of their comment, just because I'm too lazy to do the right thing. In the other post I keep referring to, I demonstrate a technique that's better than this in the first example. (It weighs in at just over 300 bytes, including comments.) The proper thing to do is escape the content for the appropriate context (and the appropriate character set).

Also, what about adding to your code using some kind of preg_replace() to clean out "style="color:#000;"", or "onClick="dothis()"", or "a href="javascript:alert('XSS')"" - would that work?

Depends on what you mean by work. If you want to keep people from talking about these things, then sure, it would work. If I did this, you wouldn't have been able to ask this question.

That's not what I want. :-)

Fri, 25 May 2007 at 21:30:53 GMT Link


23.David said:

Assuming you have the code highlighting method defined in a class, you can use a regular expression for the replacement:

That is what I was looking for :D

You highlighter runs everything through it - but by placing it in a function I can use regex to limit it to just run through the text in-between the opening an closing code brackets.

I also found another way to do it. Split the text up into different array elements and only run the "code" elements through the highlighter.

The rest of the code gets the old "htmlspecialchars" treatment and then, like you show in your other post (url not needed at this point), you can use more regex to only allow certain codes like "em".

Great, now I have something to play with ;)

Only the "preg_replace" you showed be didn't work - but that's fine I just wanted the logic.

Thanks!

Fri, 25 May 2007 at 21:52:39 GMT Link


24.John Schulz said:

Hey Chris,

How have you been? ;)

What do you think of adding class="php" to your code tags?

You could do:

<code class="php>

<span class="keyword">if</span>

</code>

or:

<span class="php keyword">if</span>

The CSS would be:

.php .keyword, /* class php with descendent keyword */

.php.keyword { /* both php AND keyword in same tag */

/* your style */

}

I used the typical code and span markup (to make sure the example got through your Friggin' Sharks With Friggin' Laser Beams form processing) but the CSS doesn't care about the tags used, only the classes applied in them.

Then when you want to post examples of crappy Ruby || Perl || JavaScript you can use different styles for them.

Perhaps you already do this but not on the comments, I'm too lazy to look.

Later,

John

Sun, 17 Jun 2007 at 23:36:41 GMT Link


25.Tim Wood said:

I want to first say thanks to a great code highlighting solution.

Also for those of us using it for php files I wanted to add a function for the processing to link function names to the manual on php.net

function function_link($test_string) {
 
    $linked_string = '';
 
    //$manual = 'http://www.php.net/function.';
 
    $manual = 'http://www.php.net/';
    $linked_string = preg_replace(
 
        // Match a highlighted keyword
 
        '~([\w_]+)(\s*</span>)'.
 
        // Followed by a bracket
 
        '(\s*<span\s+class="' . $this->previous . '">\s*\()~m',
 
        // Replace with a link to the manual
 
        '<a href="' . $manual . '$1" target="_blank">$1</a>$2$3', $test_string);
    return $linked_string;
 
  }

Didn't know if that would be of interest to anyone

Wed, 29 Aug 2007 at 11:18:43 GMT Link


26.karixma said:

how can display code without line numbers ?

Fri, 30 Nov 2007 at 11:18:26 GMT Link


27.dyron said:

Not using <ol> or CSS like

ol { list-style: none; }

Tue, 09 Jun 2009 at 10:20:44 GMT Link


Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.