PHP Advent Calendar Day 3

03 Dec 2007

Today's entry is provided by Sebastian Bergmann.

Sebastian Bergmann

Name
Sebastian Bergmann
Blog
sebastian-bergmann.de
Biography
Sebastian Bergmann is a long-time contributor to various PHP projects, including PHP itself. He is the developer of PHPUnit and offers consulting, training, and coaching services to help enterprises improve the quality assurance process for their PHP-based software projects.
Location
Siegburg, Germany

Where do most bugs hide in a software project? A small script written in PHP can help us answer this question by mining a version control repository for the relevant information. This assumes, of course, that you are using version control software to manage your project, and that you are using consistent messages when you commit a bug fix, and only touch source code files relevant to the bug fix in that commit.

So, let us assume that we are using Subversion to manage our project's source code, and that we use messages such as "Fix #2204." when a bug fix is committed. We also assume that this script has filesystem access to the Subversion repository. We start with some configuration (repository location) and variable initialization:

  1. <?php
  2.  
  3. // Configure the repository location.
  4. $repository = '/var/svn/phpunit';
  5.  
  6. $paths = array();
  7. $repository = realpath($repository);
  8.  
  9. ?>

The first step is to look for all commits made to the repository for which the commit message matches our bug fix format. The svn log command can help us here. It shows log messages from the repository and does so, optionally, in XML format. PHP's SimpleXML extension provides a very simple and easily usable toolset to parse XML.

In our script, we use shell_exec() to run the svn log --xml command on our repository. The generated XML is then loaded via simplexml_load_string() into an object that we can iterate.

  1. <?php
  2.  
  3. $log = simplexml_load_string(
  4.     shell_exec(sprintf('svn log --xml file://%s', $repository))
  5. );
  6.  
  7. ?>

For each revision that matches our search criteria, we use the svnlook changed command to get the paths that were changed in that particular revision.

  1. <?php
  2.  
  3. foreach ($log->logentry as $logentry) {
  4.     $attributes = $logentry->attributes();
  5.     $revision = (int)$attributes['revision'];
  6.     $message = (string)$logentry->msg;
  7.  
  8.     if (preg_match('/Fix #([0-9]*)/i', $message, $matches)) {
  9.         $ticket = (int)$matches[1];
  10.  
  11.         $changedPaths = explode(
  12.             "\n",
  13.             shell_exec(
  14.                 sprintf(
  15.                     'svnlook changed -r %d %s',
  16.                      $revision,
  17.                      $repository
  18.                 )
  19.             )
  20.         );
  21.  
  22.         unset($changedPaths[count($changedPaths) - 1]);
  23.  
  24.         foreach ($changedPaths as $changedPath) {
  25.             $changedPath = substr($changedPath, 4);
  26.  
  27.             if (!isset($paths[$changedPath])) {
  28.                 $paths[$changedPath] = array(
  29.                     array(
  30.                         'revision' => $revision,
  31.                         'ticket' => $ticket
  32.                     )
  33.                 );
  34.             } else {
  35.                 $paths[$changedPath][] = array(
  36.                     'revision' => $revision,
  37.                     'ticket' => $ticket
  38.                 );
  39.             }
  40.         }
  41.     }
  42. }
  43.  
  44. ?>

For each source code file that is changed at least once as part of a bug fix, we maintain an array with the information of the respective revision and ticket number. In the end, we use uasort() to sort that array and print a list of the source code files that were involved in a bug in descending order respective to the number of bugs.

  1. <?php
  2.  
  3. uasort($paths, 'cmp');
  4.  
  5. foreach ($paths as $path => $data) {
  6.     printf("%4d: %s\n", count($data), $path);
  7. }
  8.  
  9. function cmp($a, $b)
  10. {
  11.     $a = count($a);
  12.     $b = count($b);
  13.  
  14.     if ($a == $b) {
  15.         return 0;
  16.     }
  17.  
  18.     return ($a > $b) ? -1 : 1;
  19. }
  20.  
  21. ?>

This entry shows you how easy it is to parse XML data with PHP in order to solve a problem that might look hard at first glance: mining a code repository for data to map past bugs to source code files. The resulting ranking of the most bug-prone source code files is a perfect base to decide which parts of your code base need more tests.

If this got you interested in quality assurance for PHP projects, you might be interested in the PHPUnit and phpUnderControl projects.