URL Vanity, by Chris Shiflett

URL Vanity

13 Jan 2007

I'm a perfectionist. As a web architect, I tend to obsess about URLs. I want them to be simple, user-friendly, and descriptive. I want them to be beautiful. I dislike underscores, file extensions, and superfluous characters. I hate the www subdomain, avoid trailing slashes, and try to maintain a shallow hierarchy. Oh, and URLs should never change.

For the most part, shiflett.org has been adhering to these guidelines since I relaunched it in 2003. However, there are a few things that need improving.

Standalone articles such as Foiling Cross-Site Attacks and The Truth about Sessions have always had good URLs. My columns, until recently, have not. I used names like security-corner-dec2004 in the URL for the article that appeared in the December 2004 issue of php|architect. Not only is this ugly, it focuses on something people don't care about (the name of the column) instead of something they do (what the column is about). This is the real problem with not putting enough thought into your URL structure before publishing something. It's easy to change just about everything else about your web site, but changing URLs is like changing an API. You can do it, but it hurts. (In many ways, URLs are the API to your web site.)

The aforementioned security-corner-dec2004 has been renamed to cross-site-request-forgeries. Because I use a permanent redirect (301) for requests made to the old URL, indexes know to use the new one. In fact, a Google search for CSRF finds my article on cross-site request forgeries with the new, updated URL. You can generate a 301 response with PHP by sending a Location header as follows:

<?php
header('Location: http://example.org/', TRUE, 301);
?>

This mitigates most of the pain caused by changing URLs, but it doesn't update old bookmarks, del.icio.us history, and the like. Therefore, I plan to accommodate the old URLs forever. Or a really, really long time. :-)

The other thing that needs improving is the URL structure for my blog. My primary dislikes are:

Blog posts are identified by a sequential number, which has almost no semantic value.
Blog posts are organized under /archive in the URL, which sounds like a place for old posts.
The blog archive has no organization other than being sorted by date. It's a dump.

I want to rectify this situation with an approach similar to what I did for articles. However, there are a few key differences:

There are far more blog posts than articles, so the organization needs to be able to scale.
Although it hasn't happened yet, I want to accommodate duplicate titles. For example, there are only so many ways to say "OmniTI is hiring again." (We really are, by the way.)
There needs to be an elegant solution for browsing the blog history, and the URL structure should reflect this.

Using /blog instead of /archive is an easy choice, but /blog/omniti-is-hiring-again can only be used once, so there needs to be something else in the URL.

I could keep the sequential number. For example, /blog/289-url-vanity. This would also make renaming easier, but I don't like the idea of keeping that useless number in the URL. It's ugly.

After much debate with Jon, I am warming up to the idea of including the date in the URL, but only the year and month. For example, /blog/2007/jan/url-vanity. I could use 01 instead of jan, but the latter seems more user-friendly.

This approach also lets me offer /blog/2007/jan as an interface for browsing posts from January 2007 (full posts) and /blog/2007 as an interface for browsing posts from all of 2007 (abstracts). I hate URLs that you can't ascend.

Which URL structure would you choose? Do you suffer from URL vanity?

Photo Location