Formatting and Highlighting PHP Code Listings

26 Oct 2006

For the impatient, here's a direct link to the example that highlights itself:

http://shiflett.org/code/highlight.php

As I mentioned in the previous post, shiflett.org is being redesigned and redeveloped from the ground up. (Nope, it's not finished yet; you'll know it when you see it.) One of the things I want to improve is commenting. This blog has been getting a lot of comments, and I really appreciate that. (Thanks!) Since the topics I talk about (PHP, MySQL, etc.) are technical, I want to let you add formatted code listings to your comments.

I've been playing with this tonight. Feel free to follow along as I go. The first thing you want to do is create an ordered list from the code you want to format ($code in these examples). This provides line numbers, among other things:

  1. <?php
  2.  
  3. /* HTML Output */
  4. $html = array();
  5.  
  6. /* Normalize Newlines */
  7. $code = str_replace("\r", "\n", $code);
  8. $code = preg_replace("!\n\n\n+!", "\n\n", $code);
  9.  
  10. $lines = explode("\n", $code);
  11.  
  12. /* Output Listing */
  13. echo "<ol class=\"code\">\n";
  14. foreach ($lines as $line) {
  15.     if (empty($line)) {
  16.         $html['line'] = '&#160;';
  17.     } else {
  18.         $html['line'] = htmlentities($line, ENT_QUOTES, 'UTF-8');
  19.     }
  20.  
  21.     echo " <li><code>{$html['line']}</code></li>\n";
  22. }
  23. echo "</ol>\n";
  24.  
  25. ?>

In order to make <code> tags preserve whitespace, you can add this to your CSS:

  1. code {
  2.     white-space: pre;
  3. }

Pretty easy, right? Now that you have a good foundation, you can start to improve it. First, add class="even" to every other list item:

  1. <?php
  2.  
  3. foreach ($lines as $key => $line) {
  4.     if (empty($line)) {
  5.         $line = '&#160;';
  6.     }
  7.  
  8.     $html['line'] = htmlentities($line, ENT_QUOTES, 'UTF-8');
  9.  
  10.     if ($key % 2) {
  11.         echo " <li class=\"even\"><code>{$html['line']}</code></li>\n";
  12.     } else {
  13.         echo " <li><code>{$html['line']}</code></li>\n";
  14.     }
  15. }
  16.  
  17. ?>

This lets you add a subtle background color to the even rows, making the code easier to read:

  1. ol.code li.even {
  2.     background:#f3f3f0;
  3. }

The next step is to add syntax highlighting. This is a bit more involved, but only if you're picky. (I am.) You can use token_get_all() and loop through the tokens yourself, or you can use highlight_string() and try to clean up its output. I have chosen the latter.

You avoid some of the cleanup by using this idea I got from Wez:

  1. <?php
  2.  
  3. ini_set('highlight.comment', 'comment');
  4. ini_set('highlight.default', 'default');
  5. ini_set('highlight.keyword', 'keyword');
  6. ini_set('highlight.string', 'string');
  7. ini_set('highlight.html', 'html');
  8.  
  9. $code = highlight_string($code, TRUE);
  10.  
  11. ?>

This gets rid of colors and uses meaningful names instead, but it leaves behind plenty of ugliness. If you're like me, the first thing you want to do is get rid of the extra crap that highlight_string() adds to the front and end of the string:

  1. <?php
  2.  
  3. $code = substr($code, 33, -15);
  4.  
  5. ?>

If you're using PHP 4, this is going to be different. You can do something more clever to accommodate both. I didn't.

A simple replacement can turn inline styles into classes:

  1. <?php
  2.  
  3. $code = str_replace('<span style="color: ', '<span class="', $code);
  4.  
  5. ?>

If you're using PHP 4, you're going to need to do this for <font> tags instead, but it's the same basic idea.

Might as well turn &nbsp; back into a space, &amp; into &#38;, and <br /> back into a newline while you're at it:

  1. <?php
  2.  
  3. $code = str_replace('&nbsp;', ' ', $code);
  4. $code = str_replace('&amp;', '&#38;', $code);
  5. $code = str_replace('<br />', "\n", $code);
  6.  
  7. ?>

Now you can put the pieces together, but there's one more obstacle to overcome. The highlight_string() function closes a <span> tag just before opening the next one, sometimes several lines later. This can yield output that looks like this:

  1. <li><code><span class="comment">...</code></li>
  2. <li><code>...</code></li>
  3. <li><code>...</span></code></li>

You want it to look more balanced, like this:

  1. <li><code><span class="comment">...</span></code></li>
  2. <li><code><span class="comment">...</span></code></li>
  3. <li><code><span class="comment">...</span></code></li>

Feel free to solve this one on your own. (Solving this almost made me wish I had used token_get_all() instead of highlight_string().) If you're interested in seeing my solution, I've got an example that highlights itself, complete with a document type, styles, and everything else needed to make it validate as XHTML 1.0 Strict. (View source if you want to really appreciate the XHTML goodness.)

Thanks to Jon Tan for the styles and colors. He's the accessibility, usability, standards, and design expert that's helping with the new site.

I'll probably be making some minor improvements to this code before using it in production on the new site. If you notice any bugs or can think of any improvements, please leave a comment. Thanks!