About the Author

Chris Shiflett

Chris Shiflett is an author and speaker who leads the web application security practice at OmniTI.


PHP Advent Calendar Day 3

Today's entry is provided by Sebastian Bergmann.

Sebastian Bergmann

Name
Sebastian Bergmann
Blog
sebastian-bergmann.de
Biography
Sebastian Bergmann is a long-time contributor to various PHP projects, including PHP itself. He is the developer of PHPUnit and offers consulting, training, and coaching services to help enterprises improve the quality assurance process for their PHP-based software projects.
Location
Siegburg, Germany

Where do most bugs hide in a software project? A small script written in PHP can help us answer this question by mining a version control repository for the relevant information. This assumes, of course, that you are using version control software to manage your project, and that you are using consistent messages when you commit a bug fix, and only touch source code files relevant to the bug fix in that commit.

So, let us assume that we are using Subversion to manage our project's source code, and that we use messages such as "Fix #2204." when a bug fix is committed. We also assume that this script has filesystem access to the Subversion repository. We start with some configuration (repository location) and variable initialization:

<?php
 
// Configure the repository location.
$repository = '/var/svn/phpunit';
 
$paths      = array();
$repository = realpath($repository);
 
?>

The first step is to look for all commits made to the repository for which the commit message matches our bug fix format. The svn log command can help us here. It shows log messages from the repository and does so, optionally, in XML format. PHP's SimpleXML extension provides a very simple and easily usable toolset to parse XML.

In our script, we use shell_exec() to run the svn log --xml command on our repository. The generated XML is then loaded via simplexml_load_string() into an object that we can iterate.

<?php
 
$log = simplexml_load_string(
    shell_exec(sprintf('svn log --xml file://%s', $repository))
);
 
?>

For each revision that matches our search criteria, we use the svnlook changed command to get the paths that were changed in that particular revision.

<?php
 
foreach ($log->logentry as $logentry) {
    $attributes = $logentry->attributes();
    $revision   = (int)$attributes['revision'];
    $message    = (string)$logentry->msg;
 
    if (preg_match('/Fix #([0-9]*)/i', $message, $matches)) {
        $ticket = (int)$matches[1];
 
        $changedPaths = explode(
            "\n",
            shell_exec(
                sprintf(
                    'svnlook changed -r %d %s',
                     $revision,
                     $repository
                )
            )
        );
 
        unset($changedPaths[count($changedPaths) - 1]);
 
        foreach ($changedPaths as $changedPath) {
            $changedPath = substr($changedPath, 4);
 
            if (!isset($paths[$changedPath])) {
                $paths[$changedPath] = array(
                    array(
                        'revision' => $revision,
                        'ticket'   => $ticket
                    )
                );
            } else {
                $paths[$changedPath][] = array(
                    'revision' => $revision,
                    'ticket'   => $ticket
                );
            }
        }
    }
}
 
?>

For each source code file that is changed at least once as part of a bug fix, we maintain an array with the information of the respective revision and ticket number. In the end, we use uasort() to sort that array and print a list of the source code files that were involved in a bug in descending order respective to the number of bugs.

<?php
 
uasort($paths, 'cmp');
 
foreach ($paths as $path => $data) {
    printf("%4d: %s\n", count($data), $path);
}
 
function cmp($a, $b)
{
    $a = count($a);
    $b = count($b);
 
    if ($a == $b) {
        return 0;
    }
 
    return ($a > $b) ? -1 : 1;
}
 
?>

This entry shows you how easy it is to parse XML data with PHP in order to solve a problem that might look hard at first glance: mining a code repository for data to map past bugs to source code files. The resulting ranking of the most bug-prone source code files is a perfect base to decide which parts of your code base need more tests.

If this got you interested in quality assurance for PHP projects, you might be interested in the PHPUnit and phpUnderControl projects.

About This Post

PHP Advent Calendar Day 3 was posted on Mon, 03 Dec 2007 at 19:18:03 GMT.

9 Comments

1. Sebastian Bergmann's GravatarSebastian Bergmann said:

I just discovered that svn log --verbose --xml includes the changed paths information in the XML logfile. This means that the call to svnlook is not neccessary and the script does not need local access to the repository.

Tue, 04 Dec 2007 at 06:49:48 GMT Link


2. Uzi's GravatarUzi said:

This script is really pointless because no one names their subversion commits with names like "Fix #2244"

Besides, PHP is not C. PHP coders don't normally use functions like printf() because we can avoid them.

Tue, 04 Dec 2007 at 10:26:56 GMT Link


3. Jamie L's GravatarJamie L said:

Well, the "Fix #XXX" has been popularized recently by Project Management portals like Trac which create handy shortcut links, but yes I'm sure there are very few projects (other than a handful of respected Open Source Projects eg. PHPUnit) where the developers will do a single commit per bug fix and adhere to a standard naming convention for these commits.

Tue, 04 Dec 2007 at 10:41:07 GMT Link


4. Jamie L's GravatarJamie L said:

But perhaps therein lies the tip for today :)

"Standard your bug fixing conventions, and thou shalt get statistics"

Tue, 04 Dec 2007 at 10:43:38 GMT Link


5. Sebastian Bergmann's GravatarSebastian Bergmann said:

Most of the companies I visited this year adhere to a standard such as the one mentioned in the posting for bugfix commit messages.

And as Jamie mentions, every project that uses Trac is likely to use the format used in the script to get the benefit of a Trac feature .

Tue, 04 Dec 2007 at 12:00:57 GMT Link


6. Lars Strojny's GravatarLars Strojny said:

Trac comes with a two pretty cool scripts which helps to enforce the "single bugfix per commit" rule. trac-pre-commit-hook checks weither the commit message includes something like "fixes #123", "closes #123", "refs #123" and trac-post-commit-hook changes the related ticket accordingly (closes it when a fix is committed and references the commit, when it is just referenced). You can find the scripts here: http://trac.edgewall.org/browser/trunk/contrib

Tue, 04 Dec 2007 at 12:51:47 GMT Link


7. Sean Coates's GravatarSean Coates said:

We use the "fix #123" "fixes #234" "see #456" "re 567" notation extensively, internally. It makes trac much nicer to work with, and svn's event hooks are just awesome.

S

Tue, 04 Dec 2007 at 20:14:14 GMT Link


8. Olle Jonsson's GravatarOlle Jonsson said:

Thanks Sebastian, for a dip into what SimpleXML holds. The "wet-finger-in-the-air" metric that this script gives is very neat. Is the code copy-pasteable in full anywhere?

Give something (follow a strict convention), get something (greppable, mineable datasets). Or as your Dad would've said "Quid pro quo".

And, +1: That Trac postcommit hook revolutionized the usage of atomic commits at my workplace, too.

Wed, 05 Dec 2007 at 08:54:25 GMT Link


9. Sebastian Bergmann's GravatarSebastian Bergmann said:

The current version of the full script can be found here.

Thu, 06 Dec 2007 at 08:23:48 GMT Link


Post A Comment

Personal Details and Comment

Style Guide

Line breaks are converted to paragraphs. Also use:

  • <a href="" title="">text</a>1
  • <em>text</em>
  • <blockquote><p>text</p></blockquote>
  • <code>2  <?php  if ($foo) {      $foo = TRUE;  }  ?></code>
  1. Note: <code> can be used inline (e.g. in paragraphs) or in a block as shown. Include whitespace and newlines in blocks.

Please enter Chris (my first name) below. This is a primitive spam prevention technique, and I apologize for the inconvenience.

Preview and Submit

Upcoming Talks

O'Reilly Open Source Convention

21 - 25 Jul 2008

At Oregon Convention Center, Portland, Oregon.

ZendCon

15 - 18 Sep 2008

In Santa Clara, California.

PHP Appalachia

11 - 14 Oct 2008

At Big Bear Lodge, Gatlinburg, Tennessee.

New Comments

Amir wrote:

Hi chris! Please check this and guide me: http://forums.devnetwork.net/viewtopic.php?f=34&t=8...

Posted in
Nathan Bentley wrote:

Hi Chris, A great tutorial, which should help a lot of people! We implemented something simil...

Posted in
Daniel S wrote:

Just recently I sold my 1.gen Macbook(core duo version). And to be honest, I don't miss it for on...

Posted in Top X List of Mac OS X Annoyances
Buke Beyond wrote:

I agree it is ridiculous that php is doing this. I am using php for generating commands for othe...

Posted in PHP Stripping Newlines
Davis Ford wrote:

I agree, although I have a list of many more annoyances. However, rather than complain about the...

Posted in Top X List of Mac OS X Annoyances

Browse Comments