About the Author

Chris Shiflett

Hi, I'm Chris, a web developer and a founding member of Analog. I live and work in Brooklyn, NY.


All Posts for Dec 2007

PHP Advent Calendar 2007

Thanks very much to everyone who participated in this year's PHP Advent Calendar. The entire calendar is available at the following URL:

http://shiflett.org/blog/2007/dec

For reference, the complete list of entries is below. (See also Chris Cornutt's list and Sean Coates's calendar.)

  1. Sean Coates

  2. Elizabeth Naramore (Writing Code is Like Doing the Dishes (5 Reasons Why Documenting Your Code Makes You a Better Coder))

  3. Sebastian Bergmann

  4. James McGlinn

  5. Cal Evans (Five Resources Every PHP Developer Should Know About)

  6. Davey Shafik (APIs, UIs, and Other Underused Acronyms)

  7. Elizabeth Smith (SPL to the Rescue)

  8. Matthew Weier O'Phinney (Don't Reinvent the Wheel)

  9. Ivo Jansch (Design Patterns)

  10. Chris Cornutt

  11. Ben Ramsey

  12. Ed Finkler

  13. Terry Chay (Filter Input; Escape Output: Security Principles and Practice)

  14. David Sklar (Timing and Profiling)

  15. Paul Reinheimer (Channels and Output)

  16. Jeff Moore (What We Can Learn about Software Development from a Failing Restaurant)

  17. Ilia Alshanetsky

  18. Christian Wenz (WSDL Despite PHP 5)

  19. Marcus Börger

  20. Adam Trachtenberg (User-Defined Functions in SQLite)

  21. Luke Welling (Following the Big Dogs on Web Application Security)

  22. Derick Rethans

  23. Jay Pipes

  24. Nate Abele

Coordinating this turned out to be a lot of work, but I hope to do it again next year. There are lots of people in the PHP community who have something useful to share, and one reason to continue putting this calendar together each year is to get some original content published in December, a month when many people get busy, and blogs go dormant. With a little bit of prodding, we all hopefully learned a little more than we would have otherwise, and the people who were gracious enough to share something deserve our thanks.

Happy holidays, everyone. See you in 2008.

PHP Advent Calendar Day 24

Today's entry is provided by Nate Abele.

Nate Abele

Name
Nate Abele
Blog
cake.insertdesignhere.com
Biography
Nate Abele of OmniTI has been a core developer of the CakePHP web framework for over two years. He is known in some circles as the Johnny Cash (or "Man in Black") of the PHP community.
Location
New York, New York

Today, dear readers, I offer you no lofty words of wisdom, no dishwasher analogies, and no deep thoughts to ponder. What I can offer is a piece of simple, practical advice that you can use today, and some bits of code that I humbly submit to you below. Without further ado, let's begin.

Late last year, Chris and I (mostly Chris), came up with a clever defense-in-depth strategy for session security over a couple of beers. (We were sober most of the time; why do you ask?) The basic idea is that when dealing with session security, you want to do the best you can to ensure that the authenticated user you started talking to is the same user you're talking to now. You're probably familiar with at least some of the various forms of session attacks, so I won't bore you with the details.

There are all sorts of details in each request that you can potentially use to help be sure that you're still talking to the same user. Examples include the user's IP address and the User-Agent header. Hopefully you know that most of these details aren't reliable, because proxies and various other factors can change these things over the course of a legitimate user's session, causing you to potentially mistake a good guy for a bad guy. Common wisdom is that you can therefore never use such information.

The solution might not be black and white. If the user's IP address has been consistent over the last 50 requests, is it reasonable to assume that it's going to be the same for the next request? If it's suddenly not the same, then it seems reasonable to treat the request with a modicum of suspicion. To mitigate the risk, all you need to do is ask the user to re-authenticate. This can be as simple as prompting for the password again, a minor inconvenience for legitimate users, but a big hinderance to attackers.

In order to implement this simple system, you need two things:

  • A piece of information (key) you believe can be consistent, at least for a reasonably large portion of your users. (For example, an IP address.)

  • A threshold, after which you believe you can rely on this piece of information to be consistent in future requests.

The threshold is the important part; it's what makes this idea work. The threshold can be either a number of requests or an interval of time, but after this threshold has been met, the key is considered trustworthy, and any changes to its value will raise a red flag. Note that it's important to properly tune your threshold rules so that they're appropriate for your application, your users, and the piece of information you're tracking. Analyze your log files, and find and a sensible balance.

While the concept itself is simple, and can be implemented in any number of ways, I've created it as a component for CakePHP that you can use as follows:

<?php
 
class AccountsController extends AppController {
    var $components = array('Trending');
 
    function beforeFilter () {
        $this->Trending->track(array('User-Agent' => 20,
                                     'Net:!' => '+30 minutes'));
    }
 
    function index() {
        /* ... */
    }
 
    function edit($id = null) {
        /* ... */
    }
 
    /* ... */
}
 
?>

This example sets up two rules: one for the User-Agent header, and one for the IP address. In the context of domain-specific language I've constructed for these rules, : denotes a type of matching other than HTTP headers (e.g., Net), and ! means an exact match. So, this example only establishes a trend if the IP address is an exact match during a 30-minute period. For specifying thresholds, use strings to represent time (relative to now), and integers to represent a number of requests. So, this example establishes another trend if the User-Agent header remains consistent for 20 requests.

If either of these values change after the threshold is met, the trending component is going to to simply blackhole the request by sending back an error page and preventing any further execution. However, it has a callback property which can be used for more graceful handling, such as redirecting to a login page.

For reference, the current implementation is provided in Listing 1.

Listing 1:

<?php
/* SVN FILE: $Id: session_history.php 5112 2007-05-15 20:01:44Z nate $ */
/**
 * Define session history rules to mitigate session attacks
 *
 * Concept by Chris Shiflett
 *
 * PHP versions 4 and 5
 *
 * CakePHP(tm) :  Rapid Development Framework <http://www.cakephp.org/>
 * Copyright 2005-2007, Cake Software Foundation, Inc.
 *                                1785 E. Sahara Avenue, Suite 490-204
 *                                Las Vegas, Nevada 89104
 *
 * Licensed under The MIT License
 * Redistributions of files must retain the above copyright notice.
 *
 * @filesource
 * @copyright        Copyright 2005-2007, Cake Software Foundation, Inc.
 * @link                http://www.cakefoundation.org/projects/info/cakephp CakePHP(tm) Project
 * @package            cake
 * @subpackage        cake.cake.libs.controller.components
 * @since            CakePHP(tm) v 0.10.0.1076
 * @version            $Revision: 5112 $
 * @modifiedby        $LastChangedBy: nate $
 * @lastmodified    $Date: 2007-05-15 16:01:44 -0400 (Tue, 15 May 2007) $
 * @license            http://www.opensource.org/licenses/mit-license.php The MIT License
 */
uses('security');
 
/**
 * Allows the user to define historical tracking thresholds for trending commonly-repeated HTTP
 * request headers.
 *
 * Also allows for defining handling rules for when trends are violated.
 *
 * @package        cake
 * @subpackage    cake.cake.libs.controller.components
 */
class TrendingComponent extends Object {
/**
 * Components used by this class
 *
 * @var array
 */
    var $components = array('Session', 'Security', 'RequestHandler');
/**
 * Contains the rules...
 *
 * @var array
 */
    var $rules = array();
/**
 * After checks have been run, this holds any rules which have met the threshold but failed validation
 *
 * @var array
 */
    var $failures = array();
/**
 * A controller method to call should the validation fail
 *
 * @var string
 */
    var $callback = null;
/**
 * Sets trending threshold / violation handling rules for trending
 * request data
 *
 * @param array $rules An array of rules, keyed by HTTP header name (or other criteria)
 * @return void
 */
    function track($rules) {
        $_rules = array();
 
        foreach ($rules as $criteria => $rule) {
            if (is_int($criteria)) {
                $criteria = $rule;
                $rule = 20;
            }
            $rule = is_array($rule) ? $rule : array('threshold' => $rule);
            $key = $criteria;
 
            if (strpos($criteria, ':') === false) {
                $criteria = 'HTTP_' . up(r(array('-', ' '), '_', $criteria));
            } elseif (strpos($criteria, 'Net:') === 0) {
                /*
                 * Network class rules:
                 * Net:domain
                 * Net:subnet
                 * Net:{3}.{3}.{~20}.* (Where * is a wildcard match, and ~20 represents a range [+/- 20])
                 *
                 * The only rule actually defined so far is "!", which means an exact match
                 * @todo Finish the parsing/matching code for network class rules
                 */
            } else {
                // Other types of rules
            }
            $rule['key'] = $key;
            $_rules[$criteria] = $rule;
        }
        $this->rules = $_rules;
    }
/**
 * Checks rule definitions setup in track(), and manages increments/removals for
 * rules below the threshold.
 *
 * @param object $controller
 * @return void
 */
    function startup(&$controller) {
        $this->failures = $this->incrementAndInvalidate();
 
        if (!empty($this->failures)) {
            if (!empty($this->callback)) {
                $controller->{$this->callback}();
            } else {
                $this->Security->blackHole($controller, 'login');
            }
        }
    }
/**
 * Increment the token for each request rules
 *
 * @return void
 */
    function incrementAndInvalidate() {
        $failures = array();
 
        foreach ($this->rules as $header => $rule) {
            $cur = $this->_config($header);
 
            if (strpos($header, ':') === false) {
                $hashValue = Security::hash(Configure::read('Security.salt') . env($header));
 
                if (isset($cur['value']) && $this->thresholdMet($header)) {
                    if ($cur['value'] != $hashValue) {
                        $failures[] = $header;
                    }
                } else {
                    $cur = $this->increment($rule, $cur, $hashValue);
                }
            } else {
                list($type, $match) = explode(':', $header, 2);
 
                switch ($type) {
                    case 'Net':
                        $value = $this->RequestHandler->getClientIP();
 
                        if (!isset($cur['value'])) {
                            $cur['value'] = $this->_getNetworkAddressPattern($match, $value);
                        }
 
                        if (preg_match($cur['value'], $value)) {
                            if (!$this->thresholdMet($header)) {
                                $cur = $this->increment($rule, $cur, $value);
                            }
                        } else {
                            if (!$this->thresholdMet($header)) {
                                // Reset the count and the network pattern
                                $cur = $this->increment($rule, null, $cur['value']);
                            } else {
                                $failures[] = $header;
                            }
                        }
                    break;
                }
                // Implement other counter checks here
            }
            $this->_config($header, $cur);
        }
        return $failures;
    }
/**
 * Returns true if the threshold for a rule has been met
 *
 * @param string $rule
 * @return boolean
 */
    function increment($rule, $state, $value) {
        $state = is_array($state) ? $state : array();
        $state['value'] = isset($state['value']) ? $state['value'] : $value;
 
        if (is_int($rule['threshold'])) {
            $state['count'] = ($state['value'] == $value) ? $state['count'] + 1 : 0;
        } else {
            if (empty($state['count'])) {
                $state['count'] = strtotime($rule['threshold']);
            }
        }
        return $state;
    }
/**
 * Returns true if the threshold for a rule has been met
 *
 * @param string $rule
 * @return boolean
 */
    function thresholdMet($rule) {
        if (!isset($this->rules[$rule])) {
            foreach ($this->rules as $key => $val) {
                if ($val['key'] == $rule) {
                    $rule = $key;
                    break;
                }
            }
        }
 
        if (!isset($this->rules[$rule])) {
            return false;
        }
        $config = $this->rules[$rule];
        $cur = $this->_config($rule);
 
        if (!$cur['count']) {
            return false;
        }
        $base = strtotime(date('Y') . '-1-1');
 
        if ($cur['count'] < $base && $config['threshold'] < $base) {
            // This is a request-count-based threshold
            return ($cur['count'] >= $config['threshold']);
        } else {
            // This is a time-based threshold
            return ($cur['count'] < strtotime('now'));
        }
        return false;
    }
/**
 * Returns the stored session state for a given threshold rule
 *
 * @param string $key A rule name array key
 * @param array $state An optional state to save to the session
 * @return mixed
 */
    function _config($key, $state = null) {
        $key = r(array('.', '!', ':', '{', '}', '(', ')'), '_', $key);
 
        if ($state === null) {
            $state = $this->Session->read("Session.History.{$key}");
            return (empty($state) ? array('count' => 0) : $state);
        } else {
            return $this->Session->write("Session.History.{$key}", $state);
        }
    }
/**
 * Generates a regular expression from a network address and a match pattern
 *
 * @param string $pattern
 * @param string $base
 * @return string
 */
    function _getNetworkAddressPattern($pattern, $base) {
        $result = '.*';
 
        if ($pattern == '!') {
            $result = preg_quote($base, '/');
        }
        return '/' . $result . '/';
    }
}
 
?>

Future plans for the component include scoping by network address class and additional flagging options, but there are plenty of other things you can do such as integrating an IP-to-coordinate system, and making the threshold geography-based. There are numerous of other possibilities as well, but the important thing is to start thinking more broadly about your security strategies. Find patterns in your data and learn to use them to your advantage.

Happy holidays!

PHP Advent Calendar Day 23

Today's entry is provided by Jay Pipes.

Jay Pipes

Name
Jay Pipes
Blog
jpipes.com
Biography
Jay Pipes is the North American Community Relations Manager at MySQL. Coauthor of Pro MySQL (Apress, 2005), Jay regularly assists software developers in identifying how to make the most effective use of MySQL. He has given sessions on performance tuning at the MySQL Users Conference, RedHat Summit, NY PHP Conference, OSCON, SCALE, and Ohio LinuxFest, amongst others. He lives in Columbus, Ohio, with his wife, Julie, and his four animals. In his abundant free time, when not being pestered by his two needy cats and two noisy dogs, he daydreams in PHP code and ponders the ramifications of __clone().
Location
Columbus, Ohio

Recently, I've been busy getting the program for the MySQL Conference and Expo finalized (it's a big job!), and I wanted to take some time off from the incessant stream of emails to speakers and sponsors to send a little gift to the blogosphere. My gift for the PHP Advent Calendar is two completely random tips for you PHP and MySQL developers out there trying to squeeze performance out of your schemata and code. So, without further ado, here's my two random MySQL holiday tips.

Storing and Querying IPv4 Addresses

Many people aren't aware that any IPv4 address, commonly written using the dotted quad notation, are actually unsigned 32-bit integers. People are used to seeing the dotted quad notation, which contains 4 separate integers with values from 0 to 255 separated by dots. Together, the integers represent the class of the network within which the host machine is located. For instance, on my laptop, sitting here at home on my local (class C) network, I see that my IP address is 192.168.0.2.

When application developers want to store an IP address in a database, I typically see a column definition such as the following:

CREATE TABLE users (
    user_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    /* More Columns */
    ip_address CHAR(15) NOT NULL,
    INDEX (ip_address)
);

This makes sense, since the total amount of space possibly needed by a dotted quad notation is 15 characters (3 dots and 4 integers with a max of 3 characters per integer). However, the dotted quad notation is merely a textual representation of an unsigned integer that makes it easier for us to remember the IP address. When we store data in our schemata, though, we want to squeeze as many records into a single block of memory (or disk) as we possibly can. An INT UNSIGNED data type in MySQL needs 4 total bytes of storage, whereas the CHAR(15) needs 15 bytes of storage. If we store IPv4 addresses as unsigned integers, we can store four times as many records in an index block.

MySQL comes with two functions that translate between dotted quad and the internal unsigned integer representation of an IP address. The two functions are INET_ATON() and INET_NTOA(). The former takes the dotted quad notation and converts it into an unsigned integer. The latter does the reverse.

Using these two functions, you can both store and retrieve IPv4 addresses easily. To store, do the following:

INSERT
INTO   users (user_id, ip_address)
VALUES (NULL, INET_ATON('192.168.0.2'));

Now the IP address is stored as an unsigned integer. SELECTing from the table shows us this:

mysql> SELECT *
    -> FROM users;
+---------+------------+
| user_id | ip_address |
+---------+------------+
|       1 | 3232235522 |
+---------+------------+
1 row in set (0.00 sec)

3232235522 isn't exactly friendly to the eyes. To convert, we use the INET_NTOA() function:

mysql> SELECT user_id, INET_NTOA(ip_address) as ip_address
    -> FROM users;
+---------+-------------+
| user_id | ip_address  |
+---------+-------------+
|       1 | 192.168.0.2 |
+---------+-------------+
1 row in set (0.00 sec)

To retrieve a range of users that have IP addresses in, my local network for instance, I can use:

mysql> SELECT *
    -> FROM users
    -> WHERE ip_address
    -> BETWEEN INET_ATON('192.168.0.1') AND INET_ATON('192.168.0.255');
+---------+------------+
| user_id | ip_address |
+---------+------------+
|       1 | 3232235522 |
+---------+------------+
1 row in set (0.00 sec)

Tobias Asplundh and I did some benchmarks of storing IPv4 addresses as unsigned integers instead of CHAR(15) for last year's MySQL conference and found about an 8% performance difference searching small to medium-sized ranges on just one million records. So, it's worth the small effort to streamline your schemata and use the appropriate data types for IPv4 addresses.

The Worst-Named MySQL Status Variable Ever

Anyone who's ever seen me speak at conferences on performance tuning MySQL knows that I am fairly glib about my pet peeves with MySQL (and other things!). My biggest pet peeve used to be the old configuration variable log_long_format, which actually meant log any query not using indexes to the slow query log. Luckily, this configuration variable was renamed in MySQL 5.0.12 to log_queries_not_using_indexes. (Imagine that.)

Luckily for log_long_format, a new variable has taken its place on my pet peeves list: the status variable Qcache_free_blocks.

Before I get into why this little monster of a status variable is so, well, monstrous, here's a little background on the MySQL query cache.

Introduced way back in MySQL 4.0.1, the query cache stores a hash of the SELECT query issued against the server and the actual result set of data returned by the SELECT query. Barring any modifications to the underlying tables, a subsequent request for the exact same SELECT statement will not need to be optimized, analyzed, or even hit the storage engine layer. Instead, the query cache will simply return the pre-packaged result set directly to the requesting client. The way the query cache is structured is essentially a linked list of these result sets, stored in blocks. When an underlying table is modified, blocks containing queries that reference the modified table are marked as dirty, to be flushed out of the query cache at a later time.

For read-heavy applications, the query cache can be a significant performance boost. More on mixed and write-heavy applications later.

Now back to the little monster status variable.

The status variables which begin with Qcache represent counters that the MySQL query cache keeps in regard to the health and hit ratios of stored MYSQL_RESULTs. So, looking at the following output, what would you expect the Qcache_free_blocks variable to mean?

mysql> SHOW STATUS LIKE 'Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 22087    |
| Qcache_free_memory      | 64887904 |
| Qcache_hits             | 23945162 |
| Qcache_inserts          | 8200434  |
| Qcache_lowmem_prunes    | 658819   |
| Qcache_not_cached       | 8052109  |
| Qcache_queries_in_cache | 34818    |
| Qcache_total_blocks     | 91967    |
+-------------------------+----------+
8 rows in set (0.01 sec)

If I were a betting man — and those of you whom I have played poker with know that I am — I would bet money that the Qcache_free_blocks meant the number of free blocks in the query cache available to store stuff in. Right?

Wrong. The Qcache_free_memory actually does mean what it looks like; it is the amount of bytes available to store results within the query cache. However, Qcache_free_blocks means the number of blocks within the query cache that are fragmented and need to be cleaned up. Read that again to make sure you understand it. Even the MySQL documentation doesn't understand its purpose.

What exactly do the above statistics actually tell you? (By the way, the above is from one of the MySQL.com domain's primary DB servers.)

It tells you:

  1. There are a total of 91,967 blocks in the query cache, of which 22,087 are marked as containing results that are no longer valid. In this case, more than 25% of the blocks contain invalid result sets. (That's bad.)

    How do you relieve the query cache of this fragmentation? You could issue a FLUSH QUERY CACHE and wait a bit. FLUSH TABLES would also do it. The point is to make you aware that the tricky Qcache_free_blocks variable makes it seem as if all is good with the query cache, when in fact, it isn't!

  2. The Qcache_lowmem_prunes counter variable is 685,819. This means that the query cache has had to prune dirty, old, or invalid blocks more than 680 thousand times. Take a look at how long this server has been up and running:

    mysql> SHOW STATUS LIKE 'Upt%';
    +---------------+---------+
    | Variable_name | Value   |
    +---------------+---------+
    | Uptime        | 6449178 |
    +---------------+---------+
    1 row in set (0.00 sec)
    

    So, the server has been up and running for 6,449,178 seconds, or about 75 days. If you divide the seconds by the lowmem prunes, you see that the query cache is having prune itself about once every 9 seconds. (That's bad.)

Finally, find out the hit ratio for the MySQL query cache on this server. Only be concerned with the SELECT queries issued against the server, since those are the only ones that can be stored in the query cache. How do you get the number of SELECTs issued?

mysql> SHOW GLOBAL STATUS LIKE 'Questions%';
+---------------+-----------+
| Variable_name | Value     |
+---------------+-----------+
| Questions     | 271117447 |
+---------------+-----------+
1 row in set (0.00 sec)

There have been a total of 271,117,447 queries on this server since going online. Compare this number with the Qcache_hits status variable of 23,945,162. So, approximately 1 in 10 queries against the database is handled directly by the query cache. Is this good? Not particularly, but there is a catch. The server I have been showing is a master server, so it's handling the write load of the MySQL.com domain. The slave servers handle much of the read load.

In this scenario, it may be best to simply switch off the query cache on this master server due to the heavy fragmentation and lowmem pruning.

By contrast, the MySQL Forge 2.0 server, which is read-heavy, shows the following information:

mysql> SHOW GLOBAL STATUS LIKE 'Qcache%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Qcache_free_blocks      | 45    |
| Qcache_free_memory      | 34744 |
| Qcache_hits             | 79095 |
| Qcache_inserts          | 43276 |
| Qcache_lowmem_prunes    | 7396  |
| Qcache_not_cached       | 20955 |
| Qcache_queries_in_cache | 6539  |
| Qcache_total_blocks     | 13517 |
+-------------------------+-------+
8 rows in set (0.00 sec)
 
mysql> SHOW GLOBAL STATUS LIKE 'Questions%';
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Questions     | 201050 |
+---------------+--------+
1 row in set (0.00 sec)

Here you see about 1 in every 3 queries against the database being served directly from the query cache.

The bottom line is this: investigate the query cache status variables to ensure:

  • Your cache is not badly fragmented.

  • The query cache hit ratio is decent. (If it isn't, you might consider turning off the query cache entirely.)

Happy holidays, y'all!

PHP Advent Calendar Day 22

Today's entry is provided by Derick Rethans. Today also happens to be Derick's birthday, so I hope you'll join me in wishing him a very happy birthday. (Because I'm a little late posting this, and Derick lives in Norway, I'm afraid this is a belated birthday wish. Sorry, Derick!)

Derick Rethans

Name
Derick Rethans
Blog
derickrethans.nl
Biography
Derick Rethans has contributed in a number of ways to the PHP project, including the mcrypt, date, and filter extensions; bug fixes; additions; and leading the QA team. He now works as project leader for the eZ components project for eZ systems A.S. In his spare time he likes to work on Xdebug, watch movies, travel, and practice photography.
Location
Skien, Telemark, Norway

This might not seem like a useful gem to most of you, but it has a coolness factor that I hope you appreciate. I am a geek and engineer, and I like to know how things work. Because I deal with lots of PHP things, I want to know how PHP works. So, I spend lots of time figuring this out while working on Xdebug, but that doesn't always go deep enough.

So, some years ago, I starting hacking on a little tool called VLD. The original goal was to turn this into an encoder, but as I don't really care about encoding PHP files, it never made it that far.

What does it do? For each script, function, class, and method, this extension shows the internal execution units that represent your PHP code. There are a couple of things that you have to do before VLD shows you any output. First, of course, you have to install it. Follow the instructions, and add extension=vld.so to your php.ini file. If all is well, there should be a VLD section in your phpinfo() output. Because VLD outputs all of the opcodes to standard error, it's not really useful to run it through Apache; it is more suited to run from the command line.

Let's see what it does for the following script:

<?php
 
$a = 42;
 
if ($a < 50) {
    for ($b = 0; $b < $a; $b++) {
        echo sqrt($b), "\n";
    }
} else {
    echo "The value $a is too high.\n";
}
 
?>

After running this with the following command:

php -dvld.active=1 -dvld.verbosity=0 example1.php

You see output like this:

filename:       /tmp/example1.php
function name:  (null)
number of ops:  23
compiled vars:  !0 = $a, !1 = $b
line   #  op                     fetch  ext  return  operands
-------------------------------------------------------------
   3   0  ASSIGN                                     !0, 42
   5   1  IS_SMALLER                         ~1      !0, 50
       2  JMPZ                                       ~1, ->15
   6   3  ASSIGN                                     !1, 0
       4  IS_SMALLER                         ~3      !1, !0
       5  JMPZNZ                          9          ~3, ->14
       6* POST_INC                           ~4      !1
       7* FREE                                       ~4
       8* JMP                                        ->4
   7   9  SEND_VAR                                   !1
      10  DO_FCALL                        1          'sqrt'
      11  ECHO                                       $5
      12  ECHO                                       '%0A'
   8  13  JMP                                        ->6
   9  14  JMP                                        ->21
  10  15* INIT_STRING                        ~6
      16* ADD_STRING                         ~6      ~6, 'The+value+'
      17* ADD_VAR                            ~6      ~6, !0
      18* ADD_STRING                         ~6      ~6, '+is+too+high.%0A'
      19* PRINT                              ~7      ~6
      20* FREE                                       ~7
  13  21* RETURN                                     1
      22* ZEND_HANDLE_EXCEPTION

For every executable unit (script, function, method), it generates this type of output, showing the filename and function/method name, the number of opcodes, the IDs of compiled variables, and the opcodes (execution units) themselves.

By playing with the verbosity, you can control what kind of information VLD displays. A verbosity of 1 will add code path analysis to the output, showing which parts of the code can be executed (indicated by the * after the opcode #). A verbosity of 4 shows all possible information VLD can gather about the execution units. You can also instruct VLD not to execute the script you feed to PHP. Simply add the -dvld.execute=0 statement to the command line.

Interpreting this data is non-trivial, but I am sure you can figure it out. As a hint, !0 is a compiled variable, ~1 is a temporary value and ->15 is a jump instruction. In case you have questions, feel free to send me an email.

One last warning (just in case the big red warning on the site is not enough): VLD cannot be used to decode encoding files. Please do not ask me questions about this.

PHP Advent Calendar Day 21

Today's entry, provided by Luke Welling, is entitled Following the Big Dogs on Web Application Security.

Luke Welling

Name
Luke Welling
Blog
lukewelling.com
Biography
Luke Welling is from Melbourne, Australia, but currently lives near Washington, DC, where he ekes out a living as a security nerd at OmniTI. He sees lots of good PHP and bad PHP, and tries to write more good than bad. Over the last decade, he has applied PHP in many places where it was intended, and in many places where it was never meant to go. With his wife Laura, he wrote the bestselling book PHP and MySQL Web Development and often speaks about PHP at conferences and user groups. His hobbies include riding his horses and sticking Splayds in toasters, although he has not yet attempted to do both at once.
Location
Washington, District of Columbia

At this time of year, people are apt to get all warm and sentimental, right up until their first trip to a mall on a Saturday when they go back to hating their fellow man and instituting an "If Amazon don't sell it, you're not getting it" policy on gift giving. December is very important to retail, and very important to retail sites.

I remember some good advice I read a long time ago. Vincent Flanders & Michael Willis in Web Pages That Suck suggested you "follow the big dogs." In other words, copy Amazon. Their reasoning was sound. You will likely get it wrong on your first try, you can't afford to run usability studies of your own, and don't want to spend months and numerous iterations getting it right. Learning from other people's mistakes is always less embarrassing than learning from your own.

I have had to paraphrase here, because I opted to recycle nearly all my old books rather than ship them half way around the world. Had I wanted to check the accuracy of my quote, it would have cost me one cent to buy a second-hand copy of that book.

While the long-term relevance of most of the advice in old computer books is fairly accurately reflected by that valuation, it was good advice in 1998. If you were embarking on an ecommerce venture at a time when there was a shortage of people who knew what they were doing, best practice conventions were not settled, and innovation was rapid, there were worse philosophies you could have than "What Would Amazon Do?"

The same idea is popular today, and for the same reason. There is always a shortage of people who really know what they are doing, so there are plenty of people making decisions by asking "What Would Google/Amazon/Microsoft/eBay/PayPal/Flickr/Yahoo/YouTube/Digg/Facebook Do?" If you are in a space where nobody really knows the best way yet, copying the segment leader is a low risk, low talent shortcut to making mainly good decisions, even if does mean you are always three months behind.

The idea does not apply well to web application security. There are two main reasons for this: first, the big dogs make plenty of mistakes, and second, good security is invisible.

You might notice mistakes, you might read about exploited vulnerabilities, and you might notice PR-based attempts at the illusion of security, but you probably don't notice things quietly being done well.

Common big dog mistakes include:

Inviting People to Click Links in Email Messages
You would think that, as one of the most popular phishing targets out there, PayPal would not want to encourage people to click links in emails. Yet, if you sign up for a PayPal account, the confirmation screen requests that you do exactly that:

PayPal Confirmation Screen

Stupid Validation Rules
We all want ways to reject bad data, but it is usually not easy to define hard and fast rules to recognize it, even for data with specific formatting. Everybody wants a simple regex to ensure email addresses are well formed. Unfortunately, to permit any email that would be valid according to RFC 2822, a simple one is not going to cut it. As a result, many people add validation that is broken and reject some real addresses. Most are not as stupid as the one AOL used to have for signing up for AIM, which insisted that all email addresses end in .com, .net, .org, .edu, or .mil, but many will reject + and other valid non-alphanumeric characters in the local part of an address (the bit before the @).
Stupid Censorship Systems
Simple keyword-based censorship always annoys people. Eventually, somebody named Woodcock is going to turn up. Xbox Live is infamous for rejecting gamertags and mottos after validating them against an extensive list of "inappropriate" words. Going far beyond comedian George Carlin's notorious Seven Dirty Words, there is a list of about 2,700 words that are supposedly banned. By the time you add your regular seven, all possible misspellings thereof, most known euphemisms for body parts, racial epithets, drug-related terms, Microsoft brand names, Microsoft competitors' brand names, terms that sound official, and start heading off into foreign languages, you end up catching a lot of innocent phrases.
Broken HTML Filtering
Stripping all HTML from user-submitted content and safely displaying the result is often done badly, but is not that difficult. On the other hand, allowing some HTML formatting as user input, but disallowing "dangerous" parts is not an easy problem, especially if you are trying to foster an ecosystem of third party developers. The MySpace Samy worm worked not because MySpace failed to filter input, but because of a series of minor exploits that, combined, allowed arbitrary JavaScript. Once you choose to allow CSS, so that users can add what passes for style on MySpace, it becomes very hard to limit people to only visual effects. eBay has had less well-known problems with a similar cause, but without a dramatic replicating worm implementation. Earlier this year, scammers were placing large transparent divs over their listings, so that any click on the page triggered a mailto or loaded a page of their own. I could not see examples today, so I assume they have fixed the specific vector, but giving users a great deal of freedom to format the content that they upload makes ensuring that content is safe for others to view very difficult.
Stupidly-Long URLs
The big dogs love long, complicated URLs.
https://chat.bankofamerica.com/hc/LPBofA2/?visitor=&mses
sionkey=&cmd=file&file=chatFrame&site=LPBofA2&channel=we
b&d=1185830684250&referrer=%28engage%29%20https%3A//site
key.bankofamerica.com/sas/signon.do%3F%26detect%3D3&sess
ionkey=H6678674785673531985-3590509392420069059K35197612

Letting people get used to that sort of garbage from sites that they should be able to trust, you can't really be surprised when normal people can't tell the difference between an XSS attack hidden in URL-encoded JavaScript and a real, valid, safe URL. Even abnormal people who can decode a few common URL encodings in their heads are not really scrolling across the hidden nine tenths of the address bar to look at all that.
Looking for Simple Solutions
Security is not one simple problem, or even a set of simple problems, so looking for simple solutions such as the proposed .bank TLD is rarely helpful. This is not helped by the vendor-customer nature of much of the computer industry. The idea that you can write a check to somebody and a problem goes away is very compelling; buy a more expensive domain name, or a more expensive Extended Validation Certificate, or run an automated software scan to meet PCI compliance and you might sleep more soundly at night, but many users already don't understand the URL and other clues that their browser provides them. Giving more subtle clues to them is unlikely to help. Displaying a GIF in the corner of your web page bragging about your safety might create the illusion of security and might well help sales, but it won't actually help safety on its own.

You can't follow the public example of the big dogs. They still make some dumb decisions, they still make the small mistakes that allow the CSRF and XSS exploits that are endemic, and they are often not very responsive to disclosures. If a major site makes 99 good security decisions and one bad one, you won't notice the 99. Unfortunately, with security, you are still far better off seeing how others have been exploited and critically evaluating what they say they should be doing, rather than trying to watch what they actually are doing.

Oh, and remember to stay away from malls on weekends in December.

PHP Advent Calendar Day 20

Today's entry, provided by Adam Trachtenberg, is entitled User-Defined Functions in SQLite.

Adam Trachtenberg

Name
Adam Trachtenberg
Blog
trachtenberg.com
Biography
Adam Trachtenberg is the Senior Manager of Platform Evangelism and Disruptive Innovation at eBay, where he preaches the gospel of the eBay platform to developers and businessmen around the globe. He's the author of Upgrading to PHP 5 and coauthor of PHP Cookbook, both published by O'Reilly Media. Adam lives in San Francisco, California, which he wishes was closer to the office.
Location
San Francisco, California

SQLite is a database that's bundled with PHP 5. Unlike most other databases, SQLite is not a separate application; it's an extension that reads from and writes to regular files.

Although the name SQLite hints at a less than full-featured product, besides the usual INSERTs and SELECTs, SQLite also boasts transactions, subselects, and triggers.

My favorite SQLite feature is the one that allows you to write your own SQL functions. Because, in addition to all the built-in SQL functions, such as lower() and upper(), you can extend SQLite to include your own functions that you write in PHP.

These are known as user-defined functions, or UDFs for short. With a UDF, you embed logic into SQLite and avoid doing it yourself in PHP. Thus, you can take advantage of all of the features inherent in a database, such as sorting and finding distinct entries.

UDFs are good for chopping up strings so you can perform nonstandard collations and groupings. For example, you want to sort through a list of URLs, maybe from a referrer log file, and create a list of unique hostnames sorted alphabetically. So, http://example.com/directory/index.html and http://example.com/page.html would both map to one entry: http://example.com/.

To do this in PHP, you need to retrieve all the URLs, process them inside your script, and then sort them. Plus, somewhere in all that, you need to remove the duplicates. If it weren't for that pesky URL-conversion process, this could all be done in SQL using DISTINCT and ORDER BY.

With a UDF like the following, you foist all that hard work back onto SQLite where it belongs:

<?php
 
// CREATE table and INSERT URLs.
$db = sqlite_open('/www/support/log.db');
$sql = 'CREATE TABLE access_log(url);';
 
$urls = array('http://example.com/directory/index.html', 
              'http://example.com/page.html');
 
foreach ($urls as $url) {              
    $sql .= "INSERT
             INTO   access_log
             VALUES ('$url');";
}
sqlite_query($db, $sql);
 
// UDF Written in PHP
function url2host($url) {
    $parts = parse_url($url);
    return "{$parts['scheme']}://{$parts['host']}/";
}
 
// Map the PHP function url2host() to the SQL function host(),
// and indicate that host() will take 1 argument.
sqlite_create_function($db, 'host', 'url2host', 1);
 
$r = sqlite_query($db, 'SELECT
                        DISTINCT host(lower(url))
                        AS       clean_host 
                        FROM     access_log
                        ORDER BY clean_host;');
 
// Loop through results.
while ($row = sqlite_fetch_array($r)) {
    echo "{$row['clean_host']}\n";
}
 
?>

As expected, this outputs the following:

http://example.com/

To use a UDF, you first write a regular function in PHP. The function's arguments are what you want to pass in during the SELECT, and the function should return a single value. The url2host() function takes a URL, calls the built-in PHP function parse_url() to break the URL into its component parts, and returns a string containing the scheme (http) and the host. So, http://example.com/directory/index.html gets broken apart into many pieces. http is stored in $parts['scheme'], and example.com goes in $parts['host']. This creates a return value of http://example.com/.

The next step is to register url2host() with SQLite using sqlite_create_function(). This function takes four arguments:

  1. The Database Handle

  2. The SQLite Function Name of Your Choice

  3. The Function's Name in PHP

  4. The Number of Expected Arguments

The last argument is optional, but if you know for certain that your function only accepts a specific number of arguments, providing this information helps SQLite optimize things behind the scenes. In this example, the SQL function is host(), while the PHP function is url2host(). These names can be the same; they're different here to make the distinction between them clear.

Now you can use host() inside any SQL query that use the same database connection. The SQL above SELECTs host(lower(url)) AS clean_host. This takes the URL stored in the url column, converts it to lowercase, and calls the UDF host().

The function is not permanently registered with the database, and it goes away when you close the database. If you want to use it when you reopen the database, you must reregister it. Also, the function is only registered for that database; if you open up a new database using sqlite_connect(), you need to call sqlite_create_function() again.

The returned string is then named AS clean_host; this lets you refer to the results later on in the SQL query as well as access the value in PHP using that name. Since you're still in SQLite, you can take advantage of this to sort the list using ORDER BY clean_host. This sorts the results in alphabetical order.

There are plenty of other things you can do with UDFs, but hopefully this is enough to pique your interest and get you started.

PHP Advent Calendar Day 19

Today's entry is provided by Marcus Börger.

Marcus Börger

Name
Marcus Börger
Blog
marcus-boerger.de
Biography
Marcus Börger is a specialist in C, C++, databases, UML, XML, and of course PHP. To the PHP community, he is also known as helly. As a core developer, he contributes a lot to PHP and focuses on the new OO features of PHP 5 and Zend Engine 2. Marcus has been hacking around on all sorts of things for over 15 years, and being an avid snowboarder, he happily accepted an offer from Google to work in their Zürich office. He can often be found snatching gummy bears from his favorite Italian pizzeria and taking photographs.
Location
Zürich, Switzerland

Before I joined Google, a Swiss engineer had an idea to create an internal service at Google that generates graphs from simple URLs. Shortly thereafter, the service became quite popular internally. During this time, use of the service was restricted; a key was required. This past summer, we decided to open the service to the general public, hoping a few people would like the idea as much as we did. There is now an easy, keyless service that generates various types of charts such as line charts, bar charts, pie charts, and more.

Consider the following example chart:

Hello World

This chart is generated from the following URL:

http://chart.apis.google.com/chart?...mp;chd=s:foobar

This URL consists of the base URL (http://chart.apis.google.com/chart?) followed by the query string:

  • The chart type (cht=p3) is a 3D pie chart.

  • The chart size (chs=200x100) is 200 pixels wide and 100 pixels high.

  • The chart title (chtt=Hello+World) is Hello World.

  • The chart data (chd=s:foobar) is the simple string foobar. Choices for encoding are simple (s), extended (e), and text (t). The encoding is separated from the data by a colon.

Incoming data must be scaled to the range of the encoding, so you need to provide a way to do this. When using simple encoding, the API accepts 62 values:

  • Uppercase A (0) through Z (25)

  • Lowercase a (26) through z (51)

  • Digits 0 (52) through 9 (61)

You can indicate a missing value with an underscore (_), and you can separate data sets with a comma (,).

The following function (simple_encoding()) takes care of the encoding for you:

<?php
 
function simple_encoding(array $data, $min = 0, $max = 61) {
    $codes = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' .
             'abcdefghijklmnopqrstuvwxyz' .
             '0123456789';
    $min = (float)$min;
    $max = (float)$max;
    $diff = ($max - $min) / 61;
    $result = '';
 
    foreach ($data as $value) {
        if (is_null($data)) {
            $result .= '_';
        } else {
            $value = (int)(((float)$value - $min) / $diff);
 
            if ($value < 0 || $value > 61) {
                  $result .= '_';
            } else {
                $result .= $codes[$value];
            }
        }
    }
 
    return $result;
}
 
var_dump(simple_encoding(array(-10, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100), 20));
var_dump(simple_encoding(array(-10, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100), 20, 100));
 
?>

This produces the following:

string(12) "___AOds7____"
string(12) "___AHPWemt19"

More about the API can be found at chart.apis.google.com.

PHP Advent Calendar Day 18

Today's entry, provided by Christian Wenz, is entitled WSDL Despite PHP 5.

Christian Wenz

Name
Christian Wenz
Blog
hauser-wenz.de/blog/
Biography
Christian Wenz got hooked on PHP when he introduced it to one of the largest web sites back in early '99, and he hasn't looked back since.
Location
Munich, Germany

We are working a lot with web services these days. One of the things I like best, from a technical perspective, is the WSDL standard. WSDL is used to provide information about a SOAP web service: operation, data types, messages, location, and the like. Using WSDL, consuming a web service is very easy. Most technologies support an implementation of the proxy pattern and can automatically create a local proxy object from a WSDL description. Here's a quick example that shows how PHP 5's very own SOAP extension works with WSDL:

<?php
 
$c = new SoapClient('/path/to/wsdl');
$result = $c->myWebServiceMethod($arg1, $arg2);
 
?>

Calling myWebServiceMethod() on the local client object instructs the SOAP extension to call the remote method of the same name and parse the SOAP data coming back from the service. Unfortunately, the SOAP extension does not support automatic WSDL creation for a SOAP service, whereas most competing technologies do. One of the beauties of PHP is that the language is not strongly typed, which makes WSDL generation more difficult. However, there are several ways to generate WSDL with PHP 5, with a little help.

Userland Code

One of the oldest web services libraries for PHP is NuSOAP. The original site has not been updated in over three years, but NuSOAP remains in active development on SourceForge. When creating a NuSOAP service, call the configureWSDL() method, and provide the data types for input arguments and the return value when calling register(). Here's a short example (sans the actual business logic of the service, which resides in myWebServiceMethod()):

<?php
 
function myWebServiceFunction($inputValue) {
    /* ... */
}
 
require 'nusoap.php';
 
$soap = new soap_server();
$soap->configureWSDL('NuSOAPService', 'http://hauser-wenz.de/');
$server->wsdl->schemaTargetNamespace = 'http://soapinterop.org/xsd/';
 
$soap->register('myWebServiceMethod',
                array('inputValue' => 'xsd:string'),
                array('outputValue' => 'xsd:string'),
                'http://soapinterop.org/');
 
$HTTP_RAW_POST_DATA = isset($HTTP_RAW_POST_DATA) ? $HTTP_RAW_POST_DATA : '';
$soap->service($HTTP_RAW_POST_DATA);
 
?>

NuSOAP runs on PHP 4 and PHP 5. A NuSOAP-based web service is obviously compatible with SOAP extension and other SOAP implementations as well.

Extensions

A binary PHP extension performs much better than userland code. An extension that supports WSDL generation is WSF/PHP from WSO2. Once you install, XML comments provide hints to the extension about the data types and operations used in the service.

<?php
 
function myWebServiceFunction($inputValue) {
    /* ... */
}
 
$ops = array('myWebServiceFunction' => 'myWebServiceFunction');
$svr = new WSService(array('operations' => $ops, 'bindingStyle' => 'rpc-enc'));
$svr->reply();
 
?>

When you request this page, you get basic information about the service. When you append ?wsdl to the address, you get a fitting WSDL description.

Additional Options

Some other tools and packages promise WSDL generation; here is a (possibly incomplete) list:

Even if you don't want to rely on external code or extensions, using one of the aforementioned options might still be of interest. You can use them to create the WSDL, and then use that WSDL (after modifying the service's address) to fuel the SOAP extension-based service.

Happy holidays!

PHP Advent Calendar Day 17

Today's entry is provided by Ilia Alshanetsky.

Ilia Alshanetsky

Name
Ilia Alshanetsky
Blog
ilia.ws
Biography
Ilia Alshanetsky is an active member of the PHP development team and is the current release manager for PHP 5.2. Ilia is also the principal developer of FUDforum, an open source bulletin board, and he contributes to several other projects.
Location
Toronto, Canada

I often work with very large projects that contain hundreds or even thousands of files, and I have observed a common and rather embarrassing mistake: parse errors. Developers forget to check the syntax of their code, and as a result, the application starts displaying E_PARSE errors to users. Fortunately, there is an easy way to quickly check your code for silly parse errors with the following command:

find /path/to/code -name \*.php | xargs -n1 php -l

This command searches through the /path/to/code directory for all files with a .php extension (adjust as necessary to match your own naming conventions; add another extension such as .inc by appending -o -name \*.inc to the find command) and passes them one at a time to PHP's CLI binary with -l to indicate lint mode. In this mode, the file is parsed but not executed, and any existing parse errors will be identified. The scripts stops execution if a parse error is encountered.

You can typically check few hundred files for parse errors within a few seconds. You now have no excuse for allowing parse errors to escape into the wild. :-)

PHP Advent Calendar Day 16

Today's entry, provided by Jeff Moore, is entitled What We Can Learn about Software Development from a Failing Restaurant.

Jeff Moore

Name
Jeff Moore
Blog
procata.com/blog/
Biography
Jeff Moore is a columnist for php|architect who has been working with PHP for seven years and programming for two or three times that long, depending upon how you count.
Location
West Branch, Michigan

I like to cook. I especially like to cook for the holidays. Four or five times a year, I get to go hog wild and spend most of a day just cooking. (This Christmas, the menu is shaping up to be roast pork loin with cranberry apple sauce, roasted Brussels sprouts, scalloped potatoes, and a yam dish of some sort with maple syrup.) People sometimes tell me that I should cook professionally, but I'm really not that good at it. I just smile and say that I wouldn't want to ruin my enjoyment by making a job out of it. You see, I've never really worked in the food industry. There's not even a "do you want fries with that" in my past. I do have one guilty pleasure: a way to live vicariously in the restaurant trade.

You may have shared my indulgence. It's called reality TV. I like to watch shows that are not specifically about the preparation of food, but rather the restaurant business in general. I first got hooked on a show called The Restaurant. Then, I discovered Gordon Ramsay's Kitchen Nightmares, the British version followed by the American one. Don't forget the Canadian Restaurant Makeover. My TiVo doesn't. Drama and show business aside, I think there are things that we as programmers can learn from these shows. I'd like to focus on Gordon Ramsay's show.

The premise of each show is similar. There is a restaurant that is in trouble, and it needs to be fixed. Surprisingly, although each restaurant is different, each has problems that share similar patterns, and the same solutions are applied. (Watching these shows reminds me quite a bit of MBA case studies.)

The first segment of these shows is usually a review of the menu. Gordon Ramsay is a natural performer, with a face that was born to show disgust. He winces at strange flavor combinations, picks apart the dishes, and waves his hand up and down complicated menus lamenting the lack of focus.

This seems to be a common problem for restaurant owners. They don't want to leave any possibility unexploited. The menu expands to include any dish that anyone has ever asked for. The customers are overwhelmed by variety. The kitchen can't maintain quality across the array of choices. The restaurant is unremarkable because it does not excel at any one thing.

We can see this at work in the software industry. Have you ever worked on a bloated project? Have you worked on a project where no core feature stood out for its value, and where the feature list was all over the map? I have.

Many of these restaurant owners have a vision about the kind of restaurant they want to run. But, that vision doesn't always match what the customers in their community want. They open a fine dining restaurant in a working class neighborhood, or when they can barely cook without the help of prepackaged food and a microwave.

Sometimes, the restaurant staff has a hard time reconciling their vision with reality. The cognitive dissonance makes for good television. The chefs' assessment of their own food may not have any basis in reality. For the owners, hard times and failure breeds a conservative reluctance to change. They don't want to alienate that last meager customer base they have. Ramsay sometimes has to resort to extraordinary measures to realign the stake holders' conception of the restaurant, the menu, and themselves.

Ramsay uses a variety of techniques. If the chef produces foul tasting food, Ramsay blindfolds him and makes him taste it. If the chef thinks people like the lousy food, Ramsay takes the dish on the street and does taste comparisons. If the owner has no idea why his restaurant is empty, Ramsay goes out into the community and asks people why they don't go there. Anyone familiar with the principles of agile development should recognize the power of introducing feedback into the process.

This part of the show that interests me the most. The owner's vision has to be aligned with the community's needs. The menu has to be aligned with the staff's ability. Software projects require the same goal alignment.

Many of these establishments have suffered an overall decline in standards. Ramsay sets out to instill a pride in one's work among the staff. If the kitchen is messy, he makes them clean it. If there is bad or rotten food, he gets rid of it. If something isn't right, he makes them do it over again. Low standards beget lower standards. Along the same lines, I think sloppy code encourages more sloppy code. Ramsay says the food represents the cooks. Your code represents you. Take pride in your work.

Sometimes, the cook just wants to get the food done, and doesn't care what the customer thinks of it. In one episode, a chef drops a chicken wing on the floor and then tosses it in the fryer and intends to serves it. The grease cleans it, he claims! Have you witnessed the software equivalent of serving chicken wings off the floor? This attitude stems from a lack of empathy with the customer. Do you make fun of your users? Do you care what they think? Gordon Ramsay cares.

There are two versions of Ramsay's show. I prefer the British version, mostly because of the follow-up visit that shows whether the changes have stuck. The American version also includes an Oprah-inspired giveaway; the restaurant gets a new stove or new dishes. To me, this only confounds the social aspects of the show that I find so interesting.

Others have written about this show from a software development viewpoint. Watch the show yourself to see what you can get out of it.

PHP Advent Calendar Day 15

Today's entry, provided by Paul Reinheimer, is entitled Channels and Output.

Paul Reinheimer

Name
Paul Reinheimer
Blog
blog.preinheimer.com
Biography
Born in Vancouver, raised in Ontario, educated in Windsor, currently roaming the streets of beautiful Montreal. When not fighting off crazy Internet vixens, Paul pays his hosting and Internet bills by taking care of training for php|architect, launching his own projects like funcaday, and speaking at various conferences.
Location
Montreal, Canada

When getting started with PHP programming, we memorize rules that those who came before us hand down, such as:

  • Use mysql_real_escape_string() when you're sending data to a MySQL database.

  • Use htmlentities() when you're outputting data to a web page.

The rules might not initially make a lot of sense, but given time, we learn.

The primary reason we have such rules isn't that MySQL does a poor job of interpreting data and browsers are silly, but that we're sending data and commands (or data and metadata) through the same channel. A single stream of information carries information like php.net and instructions like <title> tags. With a single stream that carries both the instructions and the data, the receiving system might misinterpret one for the other. This misinterpretation is the basis of problems like cross-site scripting and SQL injection.

To protect against these attacks, you have two choices. You can either separate the streams or escape the data to avoid confusion. When you use prepared statements with SQL, you are essentially separating the streams; the queries you send to the database are sent separately from the data, maintaining the necessary distinction.

Using separate channels is secure and convenient, but it isn't always an option. An alternative option is to escape the data. Escaping is deceptively simple; determine which characters might be misinterpreted as instructions (special characters), and escape those characters to preserve their original meaning. Native functions exist for most common contexts, such as htmlentities() for HTML and mysql_real_escape_string() for MySQL.

Be sure to take character encoding into account. For HTML, The Content-Type header should indicate the same character encoding as htmlentities(). For MySQL, mysql_real_escape_string() maintains this consistency for you.

The next time you're sending data to an external resource, see if it's possible to separate the channels. If not, refer to the external resource's documentation to determine which characters are treated in a special way, and escape your data accordingly.

PHP Advent Calendar Day 14

Today's entry, provided by David Sklar, is entitled Timing and Profiling.

David Sklar

Name
David Sklar
Blog
sklar.com/blog/
Biography
David Sklar is a Software Architect at Ning, author of Learning PHP 5 (O'Reilly), PHP Cookbook (O'Reilly), and Essential PHP Tools (Apress), and a fan of half-sour pickles.
Location
New York, New York

You probably want your programs to run as fast as possible. This thrilling PHP Advent Calendar entry talks about ways to time and profile your code, so you can figure out what parts are slow and therefore deserving of your optimization efforts.

microtime() is a simple and direct way to track how long something takes, since it gives you a timestamp that includes milliseconds. (The actual precision varies based on the floating point representation on your system.) Just call microtime() before and after the code you want to time:

<?php
 
$start = microtime(TRUE);
preg_match('@^[a-z]+\.(php|html|js)$@', $filename);
$elapsed = microtime(TRUE) - $start;
printf("The regex match took %.06f seconds.\n", $elapsed);
 
?>

This produces something like:

The regex match took 0.000169 seconds.

Not bad, but only really useful when compared to something else. So, does the regular expression get faster or slower (or is there no change) if the subpattern is changed to non-capturing?

<?php
 
$start = microtime(TRUE);
preg_match('@^[a-z]+\.(php|html|js)$@', $filename);
$elapsed = microtime(TRUE) - $start;
printf("The regex match took %.06f seconds.\n", $elapsed);
 
$start = microtime(TRUE);
preg_match('@^[a-z]+\.(?:php|html|js)$@', $filename);
$elapsed = microtime(TRUE) - $start;
printf("The non-capturing regex match took %.06f seconds.\n", $elapsed);
 
?>

The result looks something like:

The regex match took 0.000170 seconds.
The non-capturing regex match took 0.000017 seconds.

One sample each doesn't provide much confidence, so better to run a loop of many iterations of each:

<?php
 
$iterations = 1000;
 
$elapsed = 0;
for ($i = 0; $i < $iterations; $i++) {
    $start = microtime(TRUE);
    preg_match('@^[a-z]+\.(php|html|js)$@', $filename);
    $elapsed += microtime(TRUE) - $start;
}
 
printf("capturing: %d iter in %.06f secs = %.06f iter/sec\n",
       $iterations, $elapsed, $iterations / $elapsed);
 
$elapsed = 0; 
for ($i = 0; $i < $iterations; $i++) {   
    $start = microtime(TRUE);
    preg_match('@^[a-z]+\.(?:php|html|js)$@', $filename);
    $elapsed += microtime(TRUE) - $start;
}
 
printf("non-capturing: %d iter in %.06f secs = %.06f iter/sec\n",
       $iterations, $elapsed, $iterations / $elapsed);
 
?>

This shows that the non-capturing version is faster, but not by that much:

capturing: 1000 iter in 0.003977 secs = 251472.150609 iter/sec
non-capturing: 1000 iter in 0.003459 secs = 289082.914053 iter/sec

You can enhance this in other ways, but the basic idea is to use microtime() to keep track of how long a bit of code takes to run, run that code a number of times, and keep track of the total elapsed time. This makes it easy to compare the speed of different solutions to a problem.

Critical to keep in mind when doing timing like this, however, is how much user-visible gain the your performance improvements will actually show. You don't want to make things intentionally slower with no benefit, but often a slower-running but easier-to-read solution is a better choice, especially when the performance difference may be minimal. If a regex match can be run 250,000 times a second or 290,000 times a second, choosing one regex over another isn't going to make much of a difference in practice.

A profiler such as Xdebug is very helpful for giving you the information you need to direct your performance enhancement efforts in the right direction. With Xdebug, you can identify what parts of your code run most frequently, or take up the biggest chunk of the processing time of a page. That way, you can work on speeding up what will actually make the biggest difference in total runtime.

Xdebug is a binary extension to PHP. You can download a library for Windows from xdebug.org. On other platforms, you can install it with the pecl tool. The full documentation on Xdebug's profiling capabilities is at xdebug.org/docs/profiler. Once you enable it in your PHP configuration, it observes PHP programs that execute, writing profiling data to individual text files (which can grow quite large). Then, you can load these files into a tool such as KCacheGrind or WinCacheGrind. These tools give you a pretty graphical view of the profiling data.

Profiling data is essential for effective performance optimization. It might seem bountiful to change a preg_match() call into some plain string manipulation functions, but if that code is executed infrequently, you won't see much speedup. Xdebug can show you which areas deserve your attention.

PHP Advent Calendar Day 13

Today's entry, provided by Terry Chay, is entitled Filter Input; Escape Output: Security Principles and Practice.

Terry Chay

Name
Terry Chay
Blog
terrychay.com/blog/
Biography
When Zend puts your face on a trading card, you've either arrived in the PHP world, or you're a terrorist. Terry Chay is a PHP terrorist. Being the software architect of Tagged pays the bills. When he isn't saying politically incorrect things about web development, he is in ur Web 2.0 event, eating ur lunch, taking ur photos, and fighting off ur Ruby developers with his mad ninja coding skillz. He also likes to "draw the line at yellow."
Location
San Francisco, California

One of the strangest things about living in the Bay Area is the total lack of PHP support groups. We probably have the largest density of PHP developers (skilled and unskilled) in the world, and yet finding 50 people willing to go in on a shipment of elePHPants turns you into a rock star out here.

So when there was a San Francisco PHP Meetup, I had to go. The topic this month was security. In light of that, I thought I'd use my Advent calendar entry to show that I'm not just a front-end Ajax developer guy or a PHP design patterns guy. I can also roll with the big boys and talk about web app security.

Terry and Security, the Real Oxymoron

Those who have heard my latest talk, The Internet is an Ogre, might find my choice of topic a bit ironic. After all, I am the one who said, "Web security is a luxury," and:

For any of you think good coding, design aesthetic, or web security are important, I have only one word for you: MySpace.

If there is one thing I have learned from blogging, it's that when you say outrageous things, people listen, and a few actually believe you. And, if the claim has the added bonus of being possibly true, that's just gravy. (Even if you're full of shit, people will just say you're perceptive; nobody has the balls to call you out on it, except people on Slashdot, and everyone knows they're like a stopped clock, only right twice a day.)

Besides, conferences are not fun unless you can get a rise out of Chris Shiflett, Ed Finkler, and Ilia Alshanetsky at the same time.

The truth is that security questions are some of my favorite interview questions. I'm going to cover what I started ranting about at the San Francisco PHP Meetup: these interview questions, how I'd answer them, and how this applies to my understanding of PHP and web application security.

No candidate has ever answered all these questions correctly. A coworker observed one of my interviews and said afterward, "When you asked those security questions, I thought [the candidate] was actually going to cry." I have had many headhunters complain about my interview questions to my boss.

Practice Comes from Principles

A lot of you are asking, "What book should I buy?" I recommend Chris Shiflett’s book (Essential PHP Security). (If you are too poor to buy the book, just visit the old PHP Security Guide instead.) Why? Because it is impossibly small.

Web app security is both really simple and an infinite mass of shit. If you start with an ad hoc approach, it will seem to only be the latter; but, if you take to the time to learn the building blocks which form the language of security principles, then it starts to all make sense and become the former.

Books like this, by being small, focus on the vocabulary and principles without drowning you in detail. I want you to take the time to learn this language. If you don't have the vocabulary, then you can't do web app security. Now, onto the interview questions.

Question: What is an SQL Injection Attack? Give Me an Example.

SQL injection is a vulnerability that allows input to manipulate the format of an SQL query, causing unwanted SQL to be executed by the database.

Most candidates get that part, but the second part trips about half of all candidates.

Little Bobby Tables

The basic points I'm looking for are:

  • Have you ever thought like an attacker? If you can't think like an attacker, you can't think like a defender; if you can't create exploits, you can't defend against them.

  • Does your exploit include a basic escape sequence?

  • Does your exploit inject data?

Bonus points if you can explain why PHP's MySQL extension isn't vulnerable to the xkcd exploit.

Question: Name Three Safeguards Against SQL Injection. For Each, Explain Where You Use It.

The key to answering this question is to understand that the nature of the attack is often focused on a quote mark. So, the solutions are to remove the quote mark, escape the quote mark, or use a built-in feature to protect against the quote mark.

In other words:

  1. You can filter the quote mark on input.

  2. You can escape the quote mark, using something like mysql_real_escape_string(), just before output (to the database).

  3. You can use a prepared statement, if your database supports it. (PDO emulates prepared statements if the database doesn't support them. Both the database and the extension must support prepared statements, else what you are doing is just an abstracted version of the second answer.)

Almost every candidate can give at least one, although that's not quite fair, since they have their DB interview first; they can learn about prepared statements there. Many candidates get all three with a little guidance!

Bonus points if you mention mysql_real_escape_string() and more if you explain the difference between it and addslashes(). This has never occurred, so I'm not too sure how I'd feel if someone pointed it out.)

Very few people know where to implement these safeguards. I'm hoping the new filter extension changes this. A number of people have argued with me about the correct place to filter input and escape output. Many are competent Perl developers. You'll better understand why they can't accept their mistake later in this entry.

Question: Which Safeguard Against SQL Injection Is Best?

This is a trick question. My answer is that I filter input and use prepared statements. (I escape output if prepared statements are not available.) There is no single best approach, although I give props if you assert that prepared statements are better than the alternatives.

Why? That's just good security!

Security is not a impenetrable wall. It is a decently-sized wall, with a moat in front of it, a mountain surrounding it all parts but the entrance, and a good number of guards on the battlements.

I can tell stories for hours about people who haven't understood this principle and have paid the price. But, in this case, it's easier to ask you about the following cases:

  • What if the data is a person's name, and he's Tim O'Reilly?

  • What happens if, at a later date, you decide to use the data in a different context (such as the filesystem, memcache, or HTML) prior to or instead of storing it in the database?

  • What if you migrate from MySQL to SQL Server, which has a different method for escaping? (Using '' instead of \' to represent an escaped single quote.)

  • What if you migrate to a data store that doesn't support prepared statements?

Things change. The principle here is if the security protocol is inconvenient, always implement it as early as possible in the application flow. More on this later.

Question: Create a Single Audit Point for Injection Attacks.

People always miss the earlier questions, so I never ask this anymore. I'm tired of having headhunters talk shit about me behind my back to everyone in the Bay Area.

My answer would be a Data Access pattern. If you're a framework guy, you can use a persistence layer like ActiveRecord to abstract yourself entirely from the database, because, apparently, LEFT JOIN is just too damn hard for you.

Given how many people fail this series of questions in interviews, I'm inclined to agree.

Question: What Is Cross-Site Scripting (XSS)? Cross-Site Request Forgery (CSRF)? Session Fixation? Give Me an Example of Each.

The reason I ask for examples is to give you the opportunity to apply this stuff in practice. To think like an attacker shows real, practical knowledge beyond the simple theory. Besides, the principles come from the practice.

Wikipedia has decent definitions of XSS, CSRF, and session fixation.

I'll confess that the only reason I ask about session fixation is because I occasionally meet a candidate who can regurgitate XSS and CSRF descriptions, and the sadistic side of me wants to see if I can break them. Asking about session fixation is like asking how to laugh in hexadecimal. (Answer: 48 41 48 41.) It's an obscure vulnerability known by us old-timers that is easily corrected and fun to lord over people.

CSRF is especially important because of the abundance of Ajaxified web sites. But, the one I'm really interested in is XSS, because I focus on it later in the questioning.

Question: Explain How the MySpace Worm Works. Give Me an Example that Uses CSRF to Determine the Login State on a Remote Site.

I don't typically ask these questions, but some of the more belligerent candidates bitch about the previous series as being too pedantic and "just about terminology." This dismissal is the interview equivalent of "I'm not really into Pokémon." Do they think I ask these questions for fun?

You Better Be Into This Pokémon

If you can't answer the above two questions, figure them out on your own. Understanding how to answer these is how you'll develop a zen-like ability to quickly understand security vulnerabilities and be able to build security practices once you've taken the time to understand these simple security principles. (In this case, you must combine XSS, CSRF, and Javascript exceptions.)

Question: What Does "Filter Input; Escape Output" Mean? Give Me an Example of Each.

You can reference Wikipedia’s definition, but put simply, it is the principle that filtering should be done as soon as data enters, and escaping should be done just before it exits.

Even if people can intuit that or have already heard it (surprising few candidates have, but most can guess), many haven't contemplated what filtering and escaping really mean. That's why I ask the other questions.

You can apply this principle to SQL injection and XSS (and any other injection attack):

  • Filtering can help protect against SQL injection by removing all single quotes. (This isn't foolproof, because not all SQL injection attacks require a single quote.)

  • Filtering can help protect against XSS attacks by removing HTML tags with strip_tags(), ensuring the data adheres to a specific pattern with regular expressions, or using a combination of HTML normalization and a DOM walker that implements a whitelist or blacklist filter. These techniques remove the <script> tags as well as injections into CSS. (You may need a CSS parser unless you strip out all style attributes and <style> tags.)

  • Escaping can help protect against SQL injection by maintaining the distinction between the SQL query and the data. Use something like mysql_real_escape_string() or prepared statements.

  • Escaping can help protect against XSS attacks by maintaining the distinction between the HTML and the data. Use htmlspecialchars() or htmlentities(). Both will do things like replace < with &lt; but htmlentities() does a bit more if you know the output is HTML and not XML. (Be sure to match the character encoding of your Content-Type header.)

It might have helped if we had called it encoding instead of escaping, but we don't, so deal.

From this, the rest follows.

  • Filter on input to adhere to the practice of implementing as many security safeguards as possible as early as possible. When data enters the application, filter it first. (Some candidates misinterpret this as filtering in the client side code. That's easily avoided by the most inexperienced attackers. They forget that they're PHP developers, not front end developers; we are talking about input into the PHP application.)

  • Escape on output, because escaping functions are going to be different depending on where the data goes. If it goes to a MySQL database, you use mysql_real_escape_string(). When using it as an argument on the command line, use escapeshellarg(). When sending it back to the user in HTML, use htmlentities()

    .

These principles apply to XSS just as they did to SQL injection:

  • What if you want the HTML to be output as HTML, because your MySpace-like site has HTML editing and customization? You can't escape, but you can still filter. You've already filtered on input right?

  • When you know it isn't HTML, you should always escape for HTML on output to an HTML template. By doing so, you are protected against XSS. XSS forms the foundation of many attacks such as session hijacking and CSRF worms, so that's a good thing.

  • Therefore, you apply both security practices, because neither can offer complete coverage. And, because you don't want to rely on security being an impenetrable wall, right?

This leads to further understanding of the concept of input and output. Input doesn't necessarily mean input from the user; it means input into the application. Output doesn't necessarily mean output to the user as HTML; it means any output from the application to an external source.

PHP does what it does best but with the hard-won security principles tacked on. In the old days, this meant gluing the user to a database back end and back, but on a modern website, this means so much more. We've gone from a "3-tier" or even an "n-tier" architecture to a complicated bunch of highly-cohesive, external services, one of which is the user of the web site.

Aside: Principle to Practice

Here's an example of how I applied the above principle of "filter input; escape output" at Tagged. Because we're a MySpace-like social network, we have to base our input filtering of certain fields on a blacklist of illegal tags, properties, and URLs instead of a whitelist of allowed tags (which is more common among many libraries). I knew I could not "build an inpenetrable wall" with a blacklist; the spec is always changing and there is an "infinite mass of shit" in security.

Instead, I did what I had time to do, and I applied the principle of "filter input; escape output" to the system architecture, not just with a kick-ass user input filter (filter input); user output (remember, I can't always escape the HTML); and escaping for the database; but also on input from the database and memcache stores.

I encoded the version number of the HTML input filter on output to the database and memcache, so on input, I could check to see if it was out of date and run the HTML filter again!

Why? Because, one day, we were hacked. Within hours, a rogue XSS worm, injected into the style tags of the widgets on our site, had infected 60,000 user profiles. Even if we stopped it on user input, we still would have the 60,000 infected user profiles to deal with, spanning across all the databases in the federation, containing 50 million user profiles. It would have taken days to clean the mess!

Instead, I asked for a copy of the exploit, figured out the nature of the attack, added a CSS parser (which I had lying around, because I knew about this attack but was too lazy to look up its exact nature), hooked it up to the HTML filtering object, and bumped the version number.

All new uploaded content was filtered against it. And, as users used the site, they were fixing it (in memcache). Then, at our leisure, we could test the new filter against regressions and slowly remove the attack permanently from the database, all using the same code, and all because of the FIEO principle.

What is Magic Quotes, and Why Is It Bad?

This is the gravy. Many of you know "magic quotes is bad." But, did you know that it's monotonically bad? (For instance, a case can be made for register_globals, but none can be made for magic_quotes_gpc.

You say this simply:

Magic quotes is bad, because it escapes input, and you should filter input and escape output!

Hard experience has taught people developing large PHP sites that you should only escape on output. Anyone who has written PHP code that is deployed on scale or on a variety of hosted services knows about the nightmare that is magic_quotes_gpc. Magic quotes has the implicit assumption that the output of all input is a MySQL or PostgreSQL database, and the attacker is not very clever.

This brings us full circle to the Perl developers who continue to argue that you can (and should) escape the input. Perl is "311 code" (chmod 311 *.pl); writer can write and execute, his team and the world can execute, nobody can read. TIMTOWTDI means the Perl developer simply isn't used to the concept that their code will be broken into parts and edited by a team of people.

PHP may have its roots in "Rasmus wants to build his personal home page, and he wants a template tool to do it," but it now has to power large-scale, complex web sites such as Yahoo! and Facebook (and Tagged).

Code has become complex. Magic quotes made sense when we we were naïve about input and output; input meant from the user, and output meant to the database; magic quotes made sense when we didn't really understand the difference between filtering and escaping. Many PHP developers have since spent years trying to remove things like magic quotes from the wreckage of late-night debugging sessions. Are you going to learn from their experiences, or are you going to doom yourself to repeat them?

That is the nature of security; practice has created good principles. These principles make for good practice.

Now We're Done

Can you see now why I don't understand why I have headhunters talking shit about me behind my back? I guess they really want web sites to fall flat on their asses and be poorly-architected, vulnerable piles.

See that picture of me at the top of this entry? I'm the PHP Security Grinch, and I'm here to tell you that my heart isn't going to grow three sizes this season, and I won't be saying my questions are too hard. Bah! Humbug!

Why? Because, I actually want your site staying online; I want you able to enjoy this holiday season without the fear of being "on call" for a late-night security session.

Parting Shot

Returning to the "security is a luxury" statement that begin this entry...

One candidate's resume listed web app security under skills. Of course, they got this battery of questions from me, and, as luck would have it, it was especially egregious. I was about to move on, but the candidate, clearly frustrated by this experience, said to me:

Look, if you give me a web site, I can make it secure.

I was crestfallen; the candidate didn't know what an SQL injection attack was! Then, I realized he was right. Shit. I can make a web site secure, too; disconnect it from the Internet.

So, I suppose this is as good of an Advent tip as any:

If you absolutely have to make your web site secure this Advent, go to the colo and pull out all your network cables.

Happy Holidays!

PHP Advent Calendar Day 12

Today's entry is provided by Ed Finkler.

Ed Finkler

Name
Ed Finkler
Blog
funkatron.com
Biography
Ed Finkler is the Web and Security Archive Administrator for CERIAS at Purdue University. As a member of the PHP Security Consortium, he is the project lead on the PhpSecInfo and Inspekt security tools. He is also the creator of Spaz, an AIR-based Twitter client that won Best HTML Community App in the Adobe AIR Developer Derby.
Location
Lafayette, Indiana

Admitting ignorance is scary; for most of us, it's embarrassing. We fear ridicule and never want to seem stupid. But, this perspective is ridiculous, because we all possess great gaps in our knowledge of the world and how it works. You're almost certain to know more about some topics and less about others than the average person.

Every time I have my car serviced, I'm reminded of this, because I know very little about the subject. I don't know what questions to ask, the right way to ask them, or the right nomenclature. Rather than barrel ahead and make assumptions, I do two things:

  1. I ask questions. "Can you explain what the carburetor is?" I might not get the right answer, but I usually come away with a better understanding of what's going on, which helps me feel a little less like an idiot.

  2. I make sure I express what I need. "I have to be somewhere in an hour. Should I get a ride, or will the work be done in time?" I'm much more likely to get satisfactory service when I make it clear what I need.

With a topic we know more about, like web app development, it's even easier to make assumptions. It's also scarier to admit our assumptions and question them, because we're doing it in front of our peers. "They might discover my ignorance!"

So, the people who are willing to go out on a limb and test their assumptions have a great deal of respect from me, because they are almost inevitably the people who are confident in their abilities and don't care much about what other people think of them. They're interested in truly studying a subject and understanding it better.

Some classic assumptions made by many PHP developers include:

  • PHP is the best programming language for web applications.

  • Your application will fail if you don't use a full-stack MVC framework.

  • Frameworks slow down development.

  • Caching will solve your performance problems.

  • Python, Perl, and Ruby are inferior to PHP.

  • Object-oriented programming is superior.

  • Object-oriented programming is slow and needlessly complex.

I know I've been guilty of making these types of assumptions dozens of times, and as a consequence, I've often been unable to see the value in other approaches. I've missed out on valuable new ideas and interesting, effective techniques. Assuming, rather than knowing, closes off possibilities to us, preventing us from making the most effective choices we can.

One of the best ways I've found to challenge my assumptions is to dive in and learn about many topics. When I go to conferences, I've almost always found that the most interesting, inspiring talks are the ones on technologies or subjects I've not previously explored. For example, when I go to OSCON, I find a lot of value in attending talks in the Python and Ruby tracks. I almost always come away with a better understanding of the topic and gain new ideas to apply to my PHP-based work.

So, are you a CakePHP user? Try making an application with CodeIgniter. Are you a long-time Smarty user? Give plain PHP templating a test-run. Been programming in PHP for nearly a decade? Go through a Ruby tutorial. You'll be a better developer for it.

PHP Advent Calendar Day 11

Today's entry is provided by Ben Ramsey.

Ben Ramsey

Name
Ben Ramsey
Blog
benramsey.com
Biography
Ben Ramsey is a software architect at Schematic and the founder of the Atlanta PHP user group. He is the co-author of php|architect's Zend PHP 5 Certification Study Guide. He also spends way too much time in #phpc.
Location
Atlanta, Georgia

I frequently receive email messages, am asked at conferences or Atlanta PHP meetings, or am approached on IRC about how one can get involved in the PHP community. Being involved means different things to different people. Some just want help solving a particular problem. Others want to connect with fellow PHP aficionados and build mutually beneficial friendships, helping each other grow into better programmers. Still, there are those who want to contribute back to the language by devoting their time and skills to the betterment of a relevant project such as a PECL extension, a PEAR package, or PHP itself; writing documentation for the PHP manual; or writing articles and tutorials to help other developers.

As you can see, it's difficult to sum up with one or two pointers the vastness of the PHP community and how one can get plugged into it. People's interests and desired levels of involvement differ. Indeed, there are hundreds upon hundreds of PHP forums and help sites. It's often difficult to single out why one is better than the other. Still, there are voices that rise above the din, and, as my gift to you this holiday season, I would like to break down these outlets for getting plugged in according to the three main levels of interest I mentioned earlier: getting PHP help, connecting with the PHP community, and contributing back to the PHP community.

I will certainly miss something, so please feel free to leave a comment with your own suggestions.

Getting Help

Most newcomers to PHP just want to get help, and they want help quickly. If this is you, then you're in luck. Below are some of the most excellent resources a community can offer.

PHP Manual
I know it's become cliché to say things like RTFM in response to questions on the PHP mailing list or in a forum or IRC channel, but the fact of the matter is that the PHP manual is the best and most comprehensive manual for any programming language. Use it!
PHP Mailing Lists
The PHP general user list is still the quintessential place to go for help. Some very smart PHP developers, many of whom are PHP developers by profession, frequent the list. Its high traffic almost guarantees a quick response to your questions. Be sure to search the list archives first; your question may already be answered.
##php on Freenode
If you want help in real-time, the ##php channel on Freenode is an excellent place to find help. With about 500 people present at any time, this channel has become the primary, yet unofficial, PHP help channel.
Training
I am often asked about PHP training courses. While I do know that some local colleges and universities might provide PHP training courses (check with your community college; I know mine has PHP and MySQL courses as part of their continuing education program), the two most prominent places to get PHP training online are through php|architect and Zend.

Connecting with the Community

There are numerous ways to connect with the PHP community. The places I list here are merely from my own experience.

Planet PHP
Perhaps one of the best ways to connect with the community is to read what the community has to say. Planet PHP is an aggregation of many of the prominent blogs from around the PHP community. You'll read PHP tutorials and HOWTOs, thoughts about technology, plans for the future of PHP, and even plans for the weekend.
PHPDeveloper.org
PHPDeveloper.org is the PHP community's premier news site. Collecting the top news stories from around the PHP community on a daily basis, PHPDeveloper.org keeps the PHP community informed and up-to-date on the progress of various community projects and events.
php|architect
Although reading php|architect can be considered a way to get help, I consider it an interface to the community, which is why I list it here. It offers high-quality articles and commentary from the PHP industry's top professionals and core language contributors. For unique insights into the PHP community, be sure to check it out.
PHPCommunity.org
For last, I save the PHP Community project. This project began in December 2003. In my opinion, this was the most promising community project that never quite took shape, and it has a legacy that no other community site can rival. It lives on as the #phpc channel on Freenode. On this channel, you will find many of the movers and shakers of the PHP community hanging out and chatting about everything from database abstraction layers to beer. Keep in mind that this is not a help channel; it is a community channel! And keep this project's web site on your radar as well; I hope to see new and exciting things come of it in the coming year.

Contributing Back to the Community

Finally, if you fall into the group of those wishing to give back to the community, there are a few important places you'll want to visit and some steps you might want to take.

Start a Blog
This is not necessarily the first step to take, but it's usually the first thing I tell people who want to get involved in the PHP community in a big way. Start a blog, make a few friends (see the #phpc channel above), get on their blog rolls, help others by writing about your experiences and how you solved particular problems, and share informed opinions on topics that affect the community. This is a great way to get involved.
Develop Content for the Community
Developing content for the community is a great way to give back. You can write articles for any of the aforementioned publications. (They pay.) Zend Developer Zone also accepts podcast submissions. In addition, many members of the community can connect you with book publishers, should you have the itch to write a book.
Contribute to the PHP Documentation
Another great way to get involved is to help improve the PHP documentation. As I mentioned earlier, the PHP documentation is perhaps the best documentation of any programming language, but that's because of the community of volunteers contributing their time to document new language features and improve the existing documentation. The documentation team needs you!
Write or Maintain a PECL Extension or PEAR Package
There are many, many PECL extensions and PEAR packages that have fallen dormant. Contact the maintainers or the appropriate mailing list to take over management of a dormant project, or submit your own extension or package for review.
Fix PHP Bugs
Search through the open bug list and find a bug you can fix. Check out the PHP source, fix the bug, and submit a patch to the bug report.
Contribute to the PHP Core
Since PHP is a community project, you may contribute to the PHP core itself, provided you have a good idea and those on the internals mailing list want to accept your patch. Check out the PHP source, modify it to include your contribution, and submit the patch to the internals mailing list with a subject line that starts with [PATCH]. Before doing so, my advice is to hang out on the internals mailing list for a while and get a feel for the topics and conversation there. You don't need a reputation in the PHP community to submit a patch, but it doesn't hurt. :-)

When I was transitioning from Classic ASP development to another language, I came very close to using Java and JSP as my tools of choice for developing web applications. However, the PHP community is what drew me to this language. Several years later, the PHP community is still the most open and welcoming community for new and seasoned developers alike. Finding an inroad into the community is not difficult. All it takes is a little initiative.

Welcome to the community!

PHP Advent Calendar Day 10

Today's entry is provided by Chris Cornutt.

Chris Cornutt

Name
Chris Cornutt
Blog
blog.phpdeveloper.org
Biography
Chris Cornutt is the senior editor of PHPDeveloper.org, a popular PHP news site, as well as a lead PHP developer at a Texas natural gas distributor.
Location
Dallas, Texas

You're hacking your way through yet another project and you hit that point. You know the one; you've been going and going, churning out code like the end of the Internet is coming tomorrow, and you sit back after finishing off a long haul and are hit by something. You look at where your code is now, and reviewing how you thought of it about five hours ago, you realize there's no next step. You know you need to keep going, but you're just not sure where. If you take the wrong step, you could end up with a big bowl of spaghetti, not the elegant application you're going for.

This is prime example of bad planning practices. Many developers forget to take this all-important step before they lay their fingers to the keys. They lay out the basic idea in their heads and just start typing, and, unless they're really lucky, they hit a wall somewhere.

I've got a few tips to help:

  • Always plan. Yes, I know it's simple, but even if it's just a rough layout or a tree structure doodled on a cocktail napkin, it's something. A big process isn't required to plan out your application.

  • Have a clear goal in mind. Sit down and hash out what you want your application or web site to do, and write it all down. The less that's left to the imagination, the better. That'll help you get the basic structure down, then you can add the bells and whistles.

  • Know what you have to work with. Nothing sucks more than when you have a great idea and have basically written up the code in your head only to find out that the machine you're working on doesn't have the right version of PHP. Know your base.

  • Don't guess. If you don't know whether a certain method will work for your application, poke around. See if there's anyone else that has done it that way before. There might even be an example of it that you can use as a guide. Guessing can lead to big headaches in the future.

  • Pick your environment carefully. Just because you have a laptop and can work anywhere doesn't mean you should. Any programmer out there knows that they work best in certain situations. Figure out what that is and go with it. I'm not suggesting you crank Metallica at the office (maybe some headphones?), but do find what suits you, even if it's just a keyboard that makes things feel a bit more like home.

  • As tempting as it is, save the pretty part for last. Most developers I know build their applications while dividing their attention between the back end and the front end. This is distracting. Sure, building out an interface is fine, but don't get too wrapped up in it. If you have a designer, rely on them. If not, finish the programming as much as possible first, then come back to the look and feel later. You are keeping the interface and business logic separate, right?

  • Distractions are the devil. Most programmers I know aren't too fond of distractions, so they try to avoid them at all costs. I know how much it pains some of you out there (it hurts me, too), but turn off the email client, shut down the instant messenger, switch off the phone (if you can get away with it), and focus on coding. Not only will this help with productivity, but you can also use the same tricks when you're planning out your application. Less distractions == more focus == better structure.

I got a little wordier than I probably should have, but hopefully something in this list can be added to your practices. Best of luck to you in 2008 and remember, you can always start planning now for tomorrow's code!

PHP Advent Calendar Day 9

Today's entry, provided by Ivo Jansch, is entitled Design Patterns.

Ivo Jansch

Name
Ivo Jansch
Blog
jansch.nl
Biography
Ivo Jansch is CTO of Ibuildings, a UK and Netherlands based PHP service company. Ivo is an active blogger in the PHP community, does occasional consultancy work for Zend Technologies, and is the author of the business framework ATK.
Location
Netherlands

In my work, I visit a lot of companies that work with PHP, and I get to talk to a lot of PHP developers. I have noticed that, although most of them are familiar with object-oriented programming, the concept of Design Patterns is often new to them.

Design patterns are best practice solutions for common programming challenges. Sometimes, people confuse them with a library, or snippets of code, but they're more a blueprint for code than an actual piece of code. This way, the solution is independent of the language you're working in. Most design patterns can be implemented in any object-oriented language.

Imagine you are working on part of an application, and you need to make sure that a particular object you created is instantiated only once. If other developers are using your classes, you don't want them to create multiple objects. For example, your class works with an external resource that only accepts a single connection, and your class regulates this, so you can't have multiple instances of your class connecting to this resource.

The solution to this problem is the singleton pattern. This pattern is one of the most common design patterns and is fairly easy to implement. The singleton pattern is based on the following solution:

  • Make the constructor of the class private, so nobody can call it, except the class itself.

  • Create a static method that returns the one and only instance of the object. If there is no instance yet, create it when the first call to this method is made. This method is usually called getInstance().

See how this is just a simple specification with no code?

The following is an example implementation in PHP:

<?php
 
class MyObject
{
    private static $_instance = NULL;
 
    private function __construct()
    {
    }
 
    public static function getInstance()
    {
        if (self::$_instance == NULL) {
            self::$_instance = new MyObject();
        }
 
        return self::$_instance;
    }
}
 
$obj = MyObject::getInstance();
 
?>

In other languages, the result is similar.

The singleton pattern is probably the easiest design pattern, but it is a good example of how a design pattern is just a common solution to a common problem.

There are many more design patterns, such as an iterators to generically loop things, proxies to proxy access to particular classes, decorators to add functionality to a class without changing the class its itself, and the MVC pattern, a popular pattern for web development. The MVC pattern provides a clean solution for separating business logic (model), layout (view), and application control logic (controller).

The MVC pattern is a popular pattern used by many frameworks, such as the Zend Framework. Creating applications using this pattern can make them easier to understand and easier to maintain.

If you want to take advantage of design patterns, how can you possibly know which patterns exist and which problems they solve? An important resource is the book that started it all, Design Patterns: Elements of Reusable Object-Oriented Software by the "Gang of Four." This book, which is used in most computer science courses, explains the basics as well as most of the patterns.

There's another book that's more relevant to PHP developers, php|architect's Guide to PHP Design Patterns by Jason Sweat. Not all patterns that exist are useful when programming PHP. In this book, Jason discusses the ones that are relevant to PHP and gives examples that implement them.

A final resource for design patterns is Wikipedia. It contains very extensive documentation on patterns, and for some it provides implementation examples in several languages, including PHP.

If you have a large code base and are looking for ways to improve the maintainability of your code, have a look at design patterns. They can be very powerful.

PHP Advent Calendar Day 8

Today's entry, provided by Matthew Weier O'Phinney, is entitled Don't Reinvent the Wheel.

Matthew Weier O'Phinney

Name
Matthew Weier O'Phinney
Blog
weierophinney.net/matthew/
Biography
Matthew Weier O'Phinney is currently a PHP developer for Zend Technologies, and is lead developer for the Zend Framework MVC and server components. He has a number of open source contributions under his belt, and wishes he had more time in the day for raw coding.
Location
Richmond, Vermont

Developers are a strange breed; we all know that others have developed libraries and components that we can use, but we have an almost insatiable desire to do it ourselves. Some call it the NIH syndrome; others feel they can do it better, or simpler, or faster. We all succumb to it at one point or another as we mature as developers; I've heard the quote that the average PHP developer has developed 2.5 frameworks.

However, writing your own code all the time is a serious waste of your time. Why write yet another RSS feed parser, or another data table gateway, or another logger, or another mailer? The time you spend doing these things is time wasted; you can get more work done using somebody else's code, which ultimately means you can complete more projects and earn more money (or help a non-profit organization achieve its mission).

Additionally, a good developer repeats the mantra Don't Repeat Yourself to themselves constantly. While this typically means avoiding code duplication in your source tree, it can easily be extended to mean Don't Repeat Others. Don't go rewriting what others have already written for you.

Finally, with well-established projects, you benefit from having had many people review the code. This means that most design issues will have been resolved, often by people smarter than you (or by collective intelligence), and many, if not all, bugs will have been identified and fixed. It also means that the community will continue to fix problems, and you won't necessarily need to.

PHP has been around for a good many years now, and there are many places you can look to for quality code:

SPL
The Standard PHP Library is a set of interfaces and classes that have hooks into the language and allow for a lot of sophisticated OOP usage. I've seen a number of people wanting to create Container or List classes; look no further than ArrayObject, which allows you to create classes that can also look and feel like arrays, including letting you sort the items.
PEAR
The PHP Extension and Application Repository has rigorous requirements for accepting new components. Perhaps its greatest strength, however, is its collection and establishment of standards: how to document your code, requirements for testing code, and more. Components written for PEAR tend to be very high quality.
PHPClasses.org
The PHP Classes Repository offers little barrier for submission, but the user ratings allow you to filter and find those that other developers have found most useful or best implemented.

In addition to these, there are a number of competing component libraries and frameworks:

Many of these projects require their developers to unit test the code prior to release; for you, the end-user developer, this means you can be assured that the code will work as specified, and continue to do so in the future as new features and improvements are provided. This will in turn save you additional time, time not spent debugging when an upgrade is performed.

So, next time you need to add a feature to your site, consider searching to see if someone else has done so already. If you find someone who has, but the code doesn't live up to your standards or needs, instead of dismissing it and starting your own, try collaborating with the author. This way, others can benefit from your skills as well, and you don't pollute the Web with yet another solution to the same problem.

PHP Advent Calendar Day 7

Today's entry, provided by Elizabeth Smith, is entitled SPL to the Rescue.

Elizabeth Smith

Name
Elizabeth Smith
Blog
elizabethmariesmith.com
Biography
Elizabeth Smith is a PHP Windows geek, lover of all things PECL, PHPWomen.org charter member, PHP-GTK 2 developer, and generally involved in doing bad things with PHP for fun and profit.
Location
Sturgis, Michigan

I like doing command line scripting with PHP; it's fun and simple. Often, I need to manipulate batches of files by recursively iterating over files in a directory and all subdirectories. Of course, I want to do it quickly and efficiently, so I can get my Christmas shopping done. Long ago, there was opendir(), and then came scandir(), but to make these functions recursive involved convoluted looping that makes my head hurt. I try to avoid recursive functions, probably because I also tend to forget something and end up with infinite loops. PHP 5 with SPL has really nifty tools to make life easier, I promise! Below is my solution for recursively iterating over files in any directory. I'll give it the exciting name of RecursiveFileIterator.

<?php
 
class RecursiveFileIterator extends RecursiveIteratorIterator
{
    /**
     * Takes a path to a directory, checks it, and then recurses into it.
     * @param $path directory to iterate
     */
    public function __construct($path)
    {
        // Use realpath() and make sure it exists; this is probably overkill, but I'm anal.
        $path = realpath($path);
 
        if (!file_exists($path)) {
            throw new Exception("Path $path could not be found.");
        } elseif (!is_dir($path)) {
            throw new Exception("Path $path is not a directory.");
        }
 
        // Use RecursiveDirectoryIterator() to drill down into subdirectories.
        parent::__construct(new RecursiveDirectoryIterator($path));
    }
}
 
// This is how you use it.
foreach (new RecursiveFileIterator('/path/to/something') as $item) {
    // Because $item is actually an SPLFileInfo object, echo gives you the absolute path from __toString() magic.
    echo $item . PHP_EOL;
}
 
?>

See what little code that takes? I added lots of comments (because I'm comment crazy) and checks (because I'm like that when I code), but can you imagine if I had used recursive functions and scandir()?

I'm being extremely boring and simply echoing the absolute path to the file. You aren't limited to that; you actually get an SPLFileInfo instance when you use RecursiveDirectoryIterator(), and it gives you all sorts of goodies such as timestamps, permissions, owner, and more.

When performing voodoo like this, I usually want to not only recursively iterate over files, but also pick which files to look at as well. You can do this with the magic of glob(), but there are some drawbacks. Firstly, glob() is not consistent cross-platform, and secondly, glob() can be really slow, especially on Windows. I prefer to use the magic of SPL's FilterIterator combined with the code above. Your accept() method can be anything you like; below is my generic version for file extension checking. I'll give it the exciting name of RecursiveFileFilterIterator. Just tell it what file extensions to allow and away you go.

<?php
 
class RecursiveFileFilterIterator extends FilterIterator
{
    /**
     * acceptable extensions - array of strings
     */
    protected $ext = array();
 
    /**
     * Takes a path and shoves it into our earlier class.
     * Turns $ext into an array.
     * @param $path directory to iterate
     * @param $ext comma delimited list of acceptable extensions
     */
    public function __construct($path, $ext = 'php')
    {
        $this->ext = explode(',', $ext);
        parent::__construct(new RecursiveFileIterator($path));
    }
 
    /**
     * Checks extension names for files only.
     */
    public function accept()
    {
        $item = $this->getInnerIterator();
 
        // If it's not a file, accept it.
        if (!$item->isFile()) {
            return TRUE;
        }
 
        // If it is a file, grab the file extension and see if it's in the array.
        return in_array(pathinfo($item->getFilename(), PATHINFO_EXTENSION), $this->ext);
    }
}
 
// Same usage as above, but you can indicate allowed extensions with the optional second argument.
foreach (new RecursiveFileFilterIterator('/path/to/something', 'php,txt') as $item) {
    // This is an SPLFileInfo object.
    echo $item . PHP_EOL;
}
 
?>

Notice again the small amount of code. The next time you want to do something to a bunch of PHP files, let SPL come to your rescue.

PHP Advent Calendar Day 6

Today's entry, provided by Davey Shafik, is entitled APIs, UIs, and Other Underused Acronyms.

Davey Shafik

Name
Davey Shafik
Blog
pixelated-dreams.com
Biography
Davey Shafik is an author, speaker, and developer with 10 years of experience in web technologies.
Location
Zephyrhills, Florida

Have you ever used a web site and thought, "How can something suck this much?" Sometimes, the ideas are great, but it just feels clunky. The most common cause is a poor user interface.

User interface design focuses on making things feel natural. A good user interface doesn't require the user to think. Typically, if you need to provide instructions on how to use something, you should instead invest more into the usability of it.

There is something good about the web; if you're faced with a problem, chances are good that someone else has already solved it. Search for sites that have similar requirements, and look at how they do things. Does it work? Does it feel natural? What can you improve?

It's important to understand that advancements in user interfaces must be evolutionary, not revolutionary. Time Machine is an exception to this rule, because it's a user interface for something that never had a good one and was never widely used by the target audience.

If you want to learn good principles of UI design, I highly recommend the book Don't Make Me Think by Steve Krug.

Code has similar needs. It should be usable and natural. The less thinking the better. In this sense, user interface design and API design are both critical to building successful applications. The two are also synergistic; the back-end API should be as simple as possible to support a simple, clean user interface.

Good APIs should be immutable. They should never need to change (additions are fine), and while this is an unrealistic goal, it is something you should strive for. A good API is one where the code behind it can be changed without requiring changes to applications that use it.

For example, you might have a getNewsStories() function that reads stories from an RSS feed and returns them as a nice data structure. As time goes by, you realize that you want to cache the stories, maybe to create archives. Now, you need to start reading from your data store, not just RSS. As long as the returned data structure hasn't changed, none of the code calling getNewsStories() has to change.

Finally, it's important to think about the data being returned from your API. Should you return HTML nuggets? What if you want to start serving your data as a web service? You should have two different sides of your API: data retrieval and rendering. By doing this, you leave yourself open to future modifications, either displaying the same information in a variety of ways within your HTML output, or in entirely new ways, like JSON.

I'll leave you with this rule of thumb. If you had to think hard to come up with the idea for a user interface or API, it's probably too complex. Let usability be your guide.

PHP Advent Calendar Day 5

Today's entry, provided by Cal Evans, is entitled Five Resources Every PHP Developer Should Know About.

Cal Evans

Name
Cal Evans
Blog
blog.calevans.com
Biography
Cal Evans is currently the Editor-in-Chief of the Zend Developer Zone; for the previous 7 years, he worked with cool LAMP projects and teams. Before that, he just sat around wishing for something like LAMP to come along, so he could build cool projects.
Location
Nashville, Tennessee

Everybody pretty much knows about the major resources available to the PHP community. Web sites like PHPDeveloper.org, Planet PHP, Zend Developer Zone, and alphaWorks make news, blogs, and good tutorials easy to come by. However, there are several good resources that aren't as widely publicized.

PHPWomen.org
PHPWomen.org's mission is to encourage women programmers to get involved with PHP. They do not exclude men; rather, they encourage women. They post good tutorials, have excellent write-ups on the various conferences for PHP developers, and have an active forum.
DZone
From the creator of Javalobby.org, Rick Ross, comes DZone, "fresh links for developers." Basically, DZone is Digg, but exclusively for developers. Rick strives to keep DZone language agnostic and even reaches out to the PHP community to make sure that we are well represented.
PHP Internals Mailing List
If you really want to know what is going on in PHP, you want to monitor the internals list. This list can have a high signal-to-noise ratio, and you may be more comfortable monitoring it via Steph Fox's excellent, if often late, weekly summaries.
KillerPHP.com
Stefan Mischook has created a web site that delivers on its name's promise. If you are just getting started, his tutorial videos are a great place to start. If you are an old-hand at PHP, then his blog is one you won't want to miss.
PHPPodcasts.com
The PHP community has a lot of interesting podcasts and video casts. The problem is that if you don't know where to look, they can be hard to find. PHPPodcasts.com is a resource locater. It's not designed for you to subscribe to, but rather to visit occasionally to see what's new and available.

PHP Advent Calendar Day 4

Today's entry is provided by James McGlinn.

James McGlinn

Name
James McGlinn
Blog
blog.phpdeveloper.co.nz
Biography
James McGlinn is the CTO of Eventfinder (a major New Zealand entertainment site) and founder of the NZ PHP Users Group, now 500 strong. He is a member of the PHP Security Consortium and helps maintain user notes for the PHP documentation.
Location
Auckland, New Zealand

In keeping with the Advent calendar theme, this entry touches on one aspect of the festive season that's difficult to miss. Shopping. Particularly online shopping and making sure the online payment experience is a safe and secure one for your visitors.

One aspect of online shopping that can be difficult to manage is making sure that your sensitive pages are correctly served by the secure server, without slowing your site (and your visitors' experience) down by serving pages over SSL that needn't be.

If you have only one or two pages that need to be secure (the checkout for example), it seems straightforward enough to simply specify absolute links to those pages on the SSL server. But how about when you add a few more secure pages (the login screen, admin system, or perhaps an alternate checkout for affiliate sales)? And, how do you redirect back to the non-secure server when your user navigates away from the checkout system, in order to increase responsiveness and reduce server load? Not to mention the headache that working with absolute (instead of relative) URLs brings to the development process.

The simplest answer to all of these issues is to create a single script, called within the header of your online store pages, to detect whether the page requested should be secure, and redirect to the appropriate server if necessary.

First, set up your configuration:

<?php
 
$hosts = array(
    'secure'    => 'shop.mydomain.com',
    'nonsecure' => 'mydomain.com',
);
 
$secure_pages = array(
    '/checkout/',
    '/login.php',
    '/admin/',
);
 
?>

The first array contains the hostnames for your secure and non-secure servers (assume both share the same document root). The second array contains a list of the directories or specific resources that must be served from the secure server. Non-secure requests that start with one of these strings will automatically be redirected to the secure server.

On to the code itself. First, we check the request against our list of secure pages and redirect to the secure server if necessary:

<?php
 
$clean = array();
 
// Prevent whitespace characters in URL.
if (ctype_print($_SERVER['REQUEST_URI'])) {
    $clean['request_uri'] = $_SERVER['REQUEST_URI'];
} else {
    $clean['request_uri'] = '/';
}
 
if (empty($_SERVER['HTTPS']) && count($_POST) < 1) {
    // Not using SSL and not posting data, so check if we should be using SSL.
    foreach ($secure_pages as $secure_page) {
        if (substr($clean['request_uri'], 0, strlen($secure_page)) == $secure_page) {
            $new_url = sprintf('https://%s%s', $hosts['secure'], $clean['request_uri']);
            header('Location: ' . $new_url);
            exit;
        }
    }
}
 
?>

This will redirect visitors transparently to the secure server, safe in the knowledge that their credit card details can't be intercepted en route to your store. But, what about when the checkout process is complete, or they abandon the procedure part way through to add last minute items to their Christmas basket of goodies? By adding to the code above, we can redirect users still on the secure server back to the non-secure server as required:

<?php
 
elseif (!empty($_SERVER['HTTPS']) && count($_POST) < 1) {
    // Using SSL and not posting data.
    $dont_redirect = FALSE;
    foreach ($secure_pages as $secure_page) {
        if (substr($clean['request_uri'], 0, strlen($secure_page)) == $secure_page) {
            $dont_redirect = TRUE;
        }
    }
 
    if ($dont_redirect === FALSE) {
        // Redirect.
        $new_url = sprintf('http://%s%s', $hosts['nonsecure'], $clean['request_uri']);
        header('Location: ' . $new_url);
        exit;
    }
}
 
?>

This will redirect the user back to the faster non-secure server for those requests where security isn't an issue, without any more intervention on your part.

You'll notice in each of these code samples that we don't automatically redirect POST requests. This is because any data accompanying the POST request would be lost. I leave it as an exercise for you to determine what should be done in that instance and handle the condition accordingly.

PHP Advent Calendar Day 3

Today's entry is provided by Sebastian Bergmann.

Sebastian Bergmann

Name
Sebastian Bergmann
Blog
sebastian-bergmann.de
Biography
Sebastian Bergmann is a long-time contributor to various PHP projects, including PHP itself. He is the developer of PHPUnit and offers consulting, training, and coaching services to help enterprises improve the quality assurance process for their PHP-based software projects.
Location
Siegburg, Germany

Where do most bugs hide in a software project? A small script written in PHP can help us answer this question by mining a version control repository for the relevant information. This assumes, of course, that you are using version control software to manage your project, and that you are using consistent messages when you commit a bug fix, and only touch source code files relevant to the bug fix in that commit.

So, let us assume that we are using Subversion to manage our project's source code, and that we use messages such as "Fix #2204." when a bug fix is committed. We also assume that this script has filesystem access to the Subversion repository. We start with some configuration (repository location) and variable initialization:

<?php
 
// Configure the repository location.
$repository = '/var/svn/phpunit';
 
$paths      = array();
$repository = realpath($repository);
 
?>

The first step is to look for all commits made to the repository for which the commit message matches our bug fix format. The svn log command can help us here. It shows log messages from the repository and does so, optionally, in XML format. PHP's SimpleXML extension provides a very simple and easily usable toolset to parse XML.

In our script, we use shell_exec() to run the svn log --xml command on our repository. The generated XML is then loaded via simplexml_load_string() into an object that we can iterate.

<?php
 
$log = simplexml_load_string(
    shell_exec(sprintf('svn log --xml file://%s', $repository))
);
 
?>

For each revision that matches our search criteria, we use the svnlook changed command to get the paths that were changed in that particular revision.

<?php
 
foreach ($log->logentry as $logentry) {
    $attributes = $logentry->attributes();
    $revision   = (int)$attributes['revision'];
    $message    = (string)$logentry->msg;
 
    if (preg_match('/Fix #([0-9]*)/i', $message, $matches)) {
        $ticket = (int)$matches[1];
 
        $changedPaths = explode(
            "\n",
            shell_exec(
                sprintf(
                    'svnlook changed -r %d %s',
                     $revision,
                     $repository
                )
            )
        );
 
        unset($changedPaths[count($changedPaths) - 1]);
 
        foreach ($changedPaths as $changedPath) {
            $changedPath = substr($changedPath, 4);
 
            if (!isset($paths[$changedPath])) {
                $paths[$changedPath] = array(
                    array(
                        'revision' => $revision,
                        'ticket'   => $ticket
                    )
                );
            } else {
                $paths[$changedPath][] = array(
                    'revision' => $revision,
                    'ticket'   => $ticket
                );
            }
        }
    }
}
 
?>

For each source code file that is changed at least once as part of a bug fix, we maintain an array with the information of the respective revision and ticket number. In the end, we use uasort() to sort that array and print a list of the source code files that were involved in a bug in descending order respective to the number of bugs.

<?php
 
uasort($paths, 'cmp');
 
foreach ($paths as $path => $data) {
    printf("%4d: %s\n", count($data), $path);
}
 
function cmp($a, $b)
{
    $a = count($a);
    $b = count($b);
 
    if ($a == $b) {
        return 0;
    }
 
    return ($a > $b) ? -1 : 1;
}
 
?>

This entry shows you how easy it is to parse XML data with PHP in order to solve a problem that might look hard at first glance: mining a code repository for data to map past bugs to source code files. The resulting ranking of the most bug-prone source code files is a perfect base to decide which parts of your code base need more tests.

If this got you interested in quality assurance for PHP projects, you might be interested in the PHPUnit and phpUnderControl projects.

PHP Advent Calendar Day 2

Today's entry, provided by Elizabeth Naramore, is entitled Writing Code is Like Doing the Dishes (5 Reasons Why Documenting Your Code Makes You a Better Coder).

Elizabeth Naramore

Name
Elizabeth Naramore
Blog
naramore.net/blog/
Biography
Elizabeth is an active member of the PHP Community, the co-founder of PHPWomen.org, the organizer of OINK-PUG, and a moderator at PHPBuilder.com. She has also done freelance writing for Wrox/Wiley and International PHP Magazine, and she currently acts as the News Editor and Managing Editor, Books for php|architect. In her day job, Elizabeth works in e-commerce and does web development consulting.
Location
Cincinnati, Ohio

When I had my appendix out last year, and my husband had to go back to work, I needed help. My mother-in-law graciously offered to come stay with us and help. Gratefully I obliged, so the next day she was at our house, ready to go. The first task at hand was to clean the kitchen and put the dishes away. It's funny when you're in someone else's kitchen, and you have no idea where things go. The poor soul kept having to ask me where every cup, plate, pot, and pan went. I didn't mind, of course; how else would she know? We all have our little ways of doing things. So then we came to the little kids' plastic cups.

"They go underneath the cabinet," I said. "Down by the pots and pans."

She gave me an odd look, so I explained. "That's so the kids can reach them themselves. They go on that little shelf where the pan lids should go."

She opened the cabinet and started putting them away, and then she asked, "So where are the pan lids then?"

"Oh, those are over there on the other side of the kitchen in the other cabinet. It was the only place big enough to hold them. They're with the bread maker and the waffle iron."

This brings us to my first point.

  1. You should document your code like you would tell a stranger how to put the dishes away. Anyone having to maintain your code is not going to think the same way you do, and your reasons for doing things may not be obvious. Someone once said, "If your code is written well enough, you shouldn't have to document it." This may hold some water with what your code is doing, but in my opinion, it doesn't help with the why your code is doing what it's doing. Yes, it's tedious. Yes, it can make you feel defensive at times. But, remember your successors are not mind readers. If my dishes were my code, and I simply said, "Now, I'm putting the pan lids over here with the bread maker and the waffle iron," anyone reading my code would think I'm nuts. But, if I elaborate on the reasoning behind it, then it might make more sense (especially to those who have little kids).

Let's go back to my code == dishes analogy. One thing I found interesting is that my husband loads the dishwasher completely differently than I do. We've been living together for roughly 12 years, so one would think we would have adopted the same habits. Not true, I say. He puts the plates on the left; I put them on the right. He puts the coffee mugs on the bottom; I put them on the top. He doesn't pre-wash; I do. We get the same result most times, so it's never really been a big deal to either one of us, but one day he asked me why I pre-washed the dishes.

"To get the crusty stuff off," I said.

"But doesn't that defeat the whole purpose of having a dishwasher?"

Eventually I acquiesced and agreed to try it his way. I found that sure enough, our dishwasher was hearty enough to clean off the gunk 9 times out of 10. Of course, I still fight the habit of pre-washing, but this brings me to my second point.

  1. Documenting your code helps you examine each step of the process and question the answers. Sure, we always had clean dishes, but once I had to explain myself, and the answer I came up with wasn't exactly foolproof, I was willing to think outside of the box a bit and try something I'd previously been reticent to try. If you're documenting your code, you're really looking in detail at how/why things are coded, and opening up the possibility of making it better. Just because something gives you the desired result, doesn't mean it's the best way of doing things. And you might save yourself some tedious steps along the way (no more pre-washing!).

So anyway, I'm helping with the dishes at my parents' house after Thanksgiving, and I'm putting the silverware away. My parents always load the silverware pointing up, so that the dishes get cleaner. This doesn't work so well with our dishwasher, as you would not only stab yourself on the fork tines in order to put them away, you'd be grabbing the part that you're going to be sticking in your mouth later. But, the nature of my parents' dishwasher is such that the silverware holder is flat and opens from the front, allowing full access to each piece inside. So, you can safely and sanitarily grab them by the handle to put them away.

"This would never work at my house," I said. "We have completely different hardware."

  1. Documenting your code keeps you mindful of portability, scalability, and performance. If you're constantly looking at each step of your code, and mapping it out through documentation, you're able to see where system-specific issues might be occurring, especially if you're taking advantage of a benefit unique to that system. For example, if you're writing PHP 5 code, but you want to make it easy for someone running PHP 4 to still be able to use it, you can use documentation to flag anything PHP 5-specific.

Once, our dishwasher broke, and we had to call the repairman. (God forbid we wash our dishes by hand, mind you.)

"Tell me exactly how you loaded this thing," he said.

"Well, I put a few plates in, then the cups and silverware, alternating between what was next in the sink. Kind of like a LIFO system." He looked at me kind of funny; I guess people usually just said "I put the dishes in, duh!"

So, I continued. "Then I load this casserole dish, and the corn cob plates, and the corn holders."

He stopped me. "Did you use the little corn holder basket?" Chagrined, I said no. "Well, that's your problem. I bet you've got a corn holder stuck in the heating coil," he said smugly.

  1. If you document, you'll be saving yourself and others precious time debugging. If I would have only told the repairman, "I loaded the dishes," but not what specific dishes I'd loaded, or in what order, he'd likely still be taking that thing apart, and I'd be taking a second mortgage out on my home.

I have but one more dish analogy to illustrate my fifth and final point.

Returning the favor for my mother-in-law, I was helping her do the dishes at her house one evening.

"Make sure you do the coffee cups first," she said.

Perplexed, I asked her why. She shrugged and replied, "That's just the way I like to do them."

  1. Documentation can save your ass. Sometimes there really is no logical reason for why somebody wants something done a certain way. I use the phrase "at client request" in my documentation all the time. Years from now, when someone is looking at my code, instead of trashing my name, they will see those three little words, instantly understand where I'm coming from, and really be able to feel my pain. That's also my clue to my successors that this section of code is ripe for refactoring.

In short, documentation makes you a better coder. It makes you think about what you're doing down to the last detail. It helps everybody be on the same page. It helps identify potential areas for improvement, and where bugs might be occurring. It doesn't really matter what system you use for documentation, whether it's your own way or something more widely accepted like phpDocumentor. Personally I would recommend using phpDocumentor-friendly documentation in your code. If you follow its guidelines, it can automatically generate basic documentation for your application. From the phpDocumentor site:

phpDocumentor uses an extensive templating system to change your source code comments into human readable, and hence useful, formats. This system allows the creation of easy-to-read documentation in 15 different pre-designed HTML versions, PDF format, Windows Helpfile CHM format, and Docbook XML. You can also create your own templates to match the look and feel of your project.

You can read more about phpDocumentor at phpdoc.org.

PHP Advent Calendar Day 1

Welcome to the PHP Advent Calendar. If you are unfamiliar with the format of an Advent calendar, Wikipedia has a pretty good description. The PHP Advent Calendar is similar in spirit to the Perl Advent Calendar, a tradition the Perl community has sustained for several years.

Each day, starting today and ending on Christmas Day, a member of the PHP community will be sharing a PHP-related tip or trick. Today's entry is provided by Sean Coates.

Sean Coates

Name
Sean Coates
Blog
blog.phpdoc.info
Biography
Sean Coates a PHP developer who works primarily on keeping things together over at php|a by developing their software and organizing their conferences. He was formerly the Editor-in-Chief of php|architect Magazine, is the co-host of the Pro::PHP Podcast and contributes to the PHP documentation team.
Location
Montréal, Canada

If you've ever developed a script that sends batch email to customers, you know that dreaded feeling of "what have I done?!" that hits seconds after you've launched the script, and the precise moment you remember that you forgot to turn the debug flag on, and hundreds of customers have been mailed your unfinished test template. It's an amateur mistake, but it's an easy one to make. With a little bit of clever configuration, you can mitigate the risk of stray email going to real customers from your development/staging environment.

When it comes to mail() (as well as many other things), PHP (on non-Windows, and by default) prefers to delegate the heavy lifting to another piece of software: sendmail (or a sendmail compatible command-line mail transport agent). By default, PHP will call your sendmail binary, and pass it the entire message, after composing it from the headers and body supplied by the developer.

One of the side-benefits to this system is the ability to override PHP's default, and seamlessly hook in your own sendmail-workalike binary or script. By setting the sendmail_path directive in php.ini, you can easily override the actual sending of email, and instead log it for easy review.

Here's an example from one of my development environments:

$ cat /path/to/php/ini | grep sendmail_path
sendmail_path=/usr/local/bin/logmail
$ cat /usr/local/bin/logmail
cat >> /tmp/logmail.log

This little bit of config code is extremely useful in a non-production environment. In the scenario above, you don't have to worry about flipping any flags or accidentally reading the "real" customer database when you meant to read the "fake" repository that contains only your own email address. Disaster avoided.

Left alone, that log file will get pretty big over time, quickly becoming unmanageable. With a little additional hackery, and with the help of the common formail app, an alternative might look like this:

$ cat /path/to/php/ini | grep sendmail_path
sendmail_path=/usr/local/bin/trapmail
$ cat /usr/local/bin/trapmail
formail -R cc X-original-cc \
  -R to X-original-to \
  -R bcc X-original-bcc \
  -f -A"To: devteam@example.com" \
| /usr/sbin/sendmail -t -i

This script traps all mail that would normally go out (say, to a customer), and instead, delivers it to devteam@example.com (with the original fields renamed for debugging purposes).

Sure, you could override this in your own framework (if you have a centralized mail object/function, for example), but the true beauty of this method is that it works for all calls to the mail() function, even those in third-party libraries. You do, however, still need to watch out for direct SMTP calls.

Upcoming Talks

ConFoo

10 - 12 Mar 2010

At Hilton Montréal Bonaventure, Montréal, Canada.

South by Southwest

12 - 16 Mar 2010

At Austin Convention Center, Austin, Texas.

Dutch PHP Conference

10 - 12 Jun 2010

At TBD, Amsterdam, Netherlands.

O'Reilly Open Source Convention

19 - 23 Jul 2010

At Oregon Convention Center, Portland, Oregon.

New Comments

RyanTheGreat wrote:

Well, I'm not Chris, but I will do my best to address the questions raised in the comments by Ian...

Posted in Security Corner: Cross-Site Request Forgeries
Chris Shiflett wrote:

Thanks for the kind words, Simon. I'm glad you liked the tutorial. In case it's helpful, here'...

Posted in Webstock
Chris Shiflett wrote:

Hi Robin, I plan to post something about it, but it's going to be hard to express everything i...

Posted in Webstock
Simon Mahony wrote:

Hi Chris, I really enjoyed your workshop on the Evolution of Security at Webstock. I think I g...

Posted in Webstock
Robin Gorry wrote:

Hi Chris, I was wondering if you were going to post how Webstock went for you this year. I li...

Posted in Webstock

Browse Comments


Work and Books

Analog Essential PHP Security HTTP Developer's Handbook