PHP Advent Calendar Day 20

20 Dec 2007

Today's entry, provided by Adam Trachtenberg, is entitled User-Defined Functions in SQLite.

Adam Trachtenberg

Name
Adam Trachtenberg
Blog
trachtenberg.com
Biography
Adam Trachtenberg is the Senior Manager of Platform Evangelism and Disruptive Innovation at eBay, where he preaches the gospel of the eBay platform to developers and businessmen around the globe. He's the author of Upgrading to PHP 5 and coauthor of PHP Cookbook, both published by O'Reilly Media. Adam lives in San Francisco, California, which he wishes was closer to the office.
Location
San Francisco, California

SQLite is a database that's bundled with PHP 5. Unlike most other databases, SQLite is not a separate application; it's an extension that reads from and writes to regular files.

Although the name SQLite hints at a less than full-featured product, besides the usual INSERTs and SELECTs, SQLite also boasts transactions, subselects, and triggers.

My favorite SQLite feature is the one that allows you to write your own SQL functions. Because, in addition to all the built-in SQL functions, such as lower() and upper(), you can extend SQLite to include your own functions that you write in PHP.

These are known as user-defined functions, or UDFs for short. With a UDF, you embed logic into SQLite and avoid doing it yourself in PHP. Thus, you can take advantage of all of the features inherent in a database, such as sorting and finding distinct entries.

UDFs are good for chopping up strings so you can perform nonstandard collations and groupings. For example, you want to sort through a list of URLs, maybe from a referrer log file, and create a list of unique hostnames sorted alphabetically. So, http://example.com/directory/index.html and http://example.com/page.html would both map to one entry: http://example.com/.

To do this in PHP, you need to retrieve all the URLs, process them inside your script, and then sort them. Plus, somewhere in all that, you need to remove the duplicates. If it weren't for that pesky URL-conversion process, this could all be done in SQL using DISTINCT and ORDER BY.

With a UDF like the following, you foist all that hard work back onto SQLite where it belongs:

  1. <?php
  2.  
  3. // CREATE table and INSERT URLs.
  4. $db = sqlite_open('/www/support/log.db');
  5. $sql = 'CREATE TABLE access_log(url);';
  6.  
  7. $urls = array('http://example.com/directory/index.html',
  8.               'http://example.com/page.html');
  9.  
  10. foreach ($urls as $url) {
  11.     $sql .= "INSERT
  12.              INTO access_log
  13.              VALUES ('$url');";
  14. }
  15. sqlite_query($db, $sql);
  16.  
  17. // UDF Written in PHP
  18. function url2host($url) {
  19.     $parts = parse_url($url);
  20.     return "{$parts['scheme']}://{$parts['host']}/";
  21. }
  22.  
  23. // Map the PHP function url2host() to the SQL function host(),
  24. // and indicate that host() will take 1 argument.
  25. sqlite_create_function($db, 'host', 'url2host', 1);
  26.  
  27. $r = sqlite_query($db, 'SELECT
  28.                         DISTINCT host(lower(url))
  29.                         AS clean_host
  30.                         FROM access_log
  31.                         ORDER BY clean_host;');
  32.  
  33. // Loop through results.
  34. while ($row = sqlite_fetch_array($r)) {
  35.     echo "{$row['clean_host']}\n";
  36. }
  37.  
  38. ?>

As expected, this outputs the following:

http://example.com/

To use a UDF, you first write a regular function in PHP. The function's arguments are what you want to pass in during the SELECT, and the function should return a single value. The url2host() function takes a URL, calls the built-in PHP function parse_url() to break the URL into its component parts, and returns a string containing the scheme (http) and the host. So, http://example.com/directory/index.html gets broken apart into many pieces. http is stored in $parts['scheme'], and example.com goes in $parts['host']. This creates a return value of http://example.com/.

The next step is to register url2host() with SQLite using sqlite_create_function(). This function takes four arguments:

  1. The Database Handle

  2. The SQLite Function Name of Your Choice

  3. The Function's Name in PHP

  4. The Number of Expected Arguments

The last argument is optional, but if you know for certain that your function only accepts a specific number of arguments, providing this information helps SQLite optimize things behind the scenes. In this example, the SQL function is host(), while the PHP function is url2host(). These names can be the same; they're different here to make the distinction between them clear.

Now you can use host() inside any SQL query that use the same database connection. The SQL above SELECTs host(lower(url)) AS clean_host. This takes the URL stored in the url column, converts it to lowercase, and calls the UDF host().

The function is not permanently registered with the database, and it goes away when you close the database. If you want to use it when you reopen the database, you must reregister it. Also, the function is only registered for that database; if you open up a new database using sqlite_connect(), you need to call sqlite_create_function() again.

The returned string is then named AS clean_host; this lets you refer to the results later on in the SQL query as well as access the value in PHP using that name. Since you're still in SQLite, you can take advantage of this to sort the list using ORDER BY clean_host. This sorts the results in alphabetical order.

There are plenty of other things you can do with UDFs, but hopefully this is enough to pique your interest and get you started.