About the Author

Chris Shiflett

Hi, I’m Chris: entrepreneur, community leader, husband, and father. I live and work in Boulder, CO.


URL Vanity

I'm a perfectionist. As a web architect, I tend to obsess about URLs. I want them to be simple, user-friendly, and descriptive. I want them to be beautiful. I dislike underscores, file extensions, and superfluous characters. I hate the www subdomain, avoid trailing slashes, and try to maintain a shallow hierarchy. Oh, and URLs should never change.

For the most part, shiflett.org has been adhering to these guidelines since I relaunched it in 2003. However, there are a few things that need improving.

Standalone articles such as Foiling Cross-Site Attacks and The Truth about Sessions have always had good URLs. My columns, until recently, have not. I used names like security-corner-dec2004 in the URL for the article that appeared in the December 2004 issue of php|architect. Not only is this ugly, it focuses on something people don't care about (the name of the column) instead of something they do (what the column is about). This is the real problem with not putting enough thought into your URL structure before publishing something. It's easy to change just about everything else about your web site, but changing URLs is like changing an API. You can do it, but it hurts. (In many ways, URLs are the API to your web site.)

The aforementioned security-corner-dec2004 has been renamed to cross-site-request-forgeries. Because I use a permanent redirect (301) for requests made to the old URL, indexes know to use the new one. In fact, a Google search for CSRF finds my article on cross-site request forgeries with the new, updated URL. You can generate a 301 response with PHP by sending a Location header as follows:

<?php
 
header('Location: http://example.org/', TRUE, 301);
 
?>

This mitigates most of the pain caused by changing URLs, but it doesn't update old bookmarks, del.icio.us history, and the like. Therefore, I plan to accommodate the old URLs forever. Or a really, really long time. :-)

The other thing that needs improving is the URL structure for my blog. My primary dislikes are:

  • Blog posts are identified by a sequential number, which has almost no semantic value.
  • Blog posts are organized under /archive in the URL, which sounds like a place for old posts.
  • The blog archive has no organization other than being sorted by date. It's a dump.

I want to rectify this situation with an approach similar to what I did for articles. However, there are a few key differences:

  • There are far more blog posts than articles, so the organization needs to be able to scale.
  • Although it hasn't happened yet, I want to accommodate duplicate titles. For example, there are only so many ways to say "OmniTI is hiring again." (We really are, by the way.)
  • There needs to be an elegant solution for browsing the blog history, and the URL structure should reflect this.

Using /blog instead of /archive is an easy choice, but /blog/omniti-is-hiring-again can only be used once, so there needs to be something else in the URL.

I could keep the sequential number. For example, /blog/289-url-vanity. This would also make renaming easier, but I don't like the idea of keeping that useless number in the URL. It's ugly.

After much debate with Jon, I am warming up to the idea of including the date in the URL, but only the year and month. For example, /blog/2007/jan/url-vanity. I could use 01 instead of jan, but the latter seems more user-friendly.

This approach also lets me offer /blog/2007/jan as an interface for browsing posts from January 2007 (full posts) and /blog/2007 as an interface for browsing posts from all of 2007 (abstracts). I hate URLs that you can't ascend.

Which URL structure would you choose? Do you suffer from URL vanity?

About this post

URL Vanity was posted on Sat, 13 Jan 2007. If you liked it, follow me on Twitter or share:

29 comments

1.Ed Eliot said:

Interesting article. Whilst I agree that blog post URLs based on titles are more semantically appealing surely it makes the DB lookup less efficient (if only marginally). Most likely the unique number is the primary key for your blog table so looking up by a combination of title and date isn't such an optimised query. Also given that your title may contain characters which aren't suitable for use in a URL substitution has to take place both ways making the link less robust.

Sat, 13 Jan 2007 at 23:09:09 GMT Link


2.Patrick Mueller said:

Agree with most of this; most people don't care, and don't understand why urls should be kept clean. There are some nice guiding principals here:

http://www.w3.org/Provider/Style/URI

I do like to 'name' html files with an .html extension. In case someone happens to download one, and it's fairly stand-alone-ish, then guess what: works on their file system? With the 'no extension', I have to fix the file.

BTW, blog/2007/jan/xxxx won't scan well; change the jan to 01, and then bonus: you'll get free sorting.

Sun, 14 Jan 2007 at 00:07:55 GMT Link


3.Chris Shiflett said:

Hi Ed,

It would be pretty easy to use something like /blog/2007/jan/url-vanity as the primary key, because I know what that is when inserting a new post.

I try to ignore other aspects of the application when choosing a URL structure, because I don't want to make any sacrifices. If I begin thinking about implementation, I'm afraid I'll come up with something that's easier for me (the developer) but harder for everyone else (the users).

Sun, 14 Jan 2007 at 00:19:48 GMT Link


4.Chris Shiflett said:

Thanks for that link, Patrick. It was an interesting read.

It sounds like Tim agrees with me about file extensions, although his reasoning is a bit different. He considers them to be something that might change in the future. I just think they're ugly. :-)

Jon Tan (whom I referenced in the post) shares your preference for 01, 02, 03 over jan, feb, mar. I still prefer the abbreviations, but I don't have a strong opinion, so I might change my mind.

Sun, 14 Jan 2007 at 00:34:39 GMT Link


5.Paul Reinheimer said:

I'm presently re-configuring my entire blog, with the new system launching with my new (far over due) layout and theme.

I'm planning on working with a combination of systems, something like yours of year/month/title, but also a short url of simply domain.com/ID. People give each other two urls in two fundamentally different ways, electronically and verbally. I want to support both.

Pretty urls like the ones you're suggesting work really well electronically, but I've always had a hard time reading such urls to other people, dashes, underscores, tildes, etc. just seem to cause confusion for the non-technical.

There are two difficulties as I see it, first search engines don't like duplicate content at multiple URLs, second how to give that duplicity of information to people in a consistent manner. I'm still working on thise.

Sun, 14 Jan 2007 at 00:35:42 GMT Link


6.Jordan said:

(most of my comments are directly applicable to WordPress below since that's what i use--saying it once here rather than each time below)

Totally agree on the need for "pretty" urls. My vote goes for the full date along with title. It's easy to set up with the a standard option and certainly meets my needs (except for the trailing slash*). For example, my latest entry is:

http://wantingseed.com/sprout/2007/...9/what-he-said/

Walking up the directory structure behaves exactly as you'd expect. By using numbered month instead of abbreviated month name, the total url length isn't much longer when adding day of the month, and I like the completeness of knowing based just on the url the exact date of the post.

Ed: What substitution has to occur both ways? The title's filtered once when you post your content and stored separately (called the "post slug" and it's not expanded by default in the editor). If you don't like it, just change it.

*Huh -- what do you know -- while looking in my settings, I realized the Custom Structure of permalinks in WP allows this to be fixed really easily -- ending slash now removed by default on my blog)

Sun, 14 Jan 2007 at 01:40:41 GMT Link


7.Martin Jansen said:

The signature of header() is

header (string [, bool [, int]])

so your code should probably look like

header("Location: ...", true, 301);

Also note that the status code is passed as an integer, while you used a string. Not a big deal in PHP, but people who are keen on nice URLs should also be keen on "nice code" ;).

Sun, 14 Jan 2007 at 08:54:13 GMT Link


8.Ed Eliot said:

Chris, Jordan - yes I was being a bit thick. ;-) I was thinking that you'd work out the slug on each request but of course it makes much more sense to store this as an additional field created when adding the post which you look up from.

Sun, 14 Jan 2007 at 12:11:48 GMT Link


9.Chris Shiflett said:

Martin, you're right. I forgot the second argument. Thanks.

I think of HTTP requests and responses as strings, but if the manual says integer, I guess I can comply. :-)

Sun, 14 Jan 2007 at 15:28:17 GMT Link


10.Ahmed Shreef said:

Chris, there were a talk on Zend Framework's mailing list about something like that, maybe you will like to take a look or help with your thoughts:

http://www.nabble.com/Controller-subdirectory-organization-%28ZF-637%29-tf2941021s16154.html

Sun, 14 Jan 2007 at 22:51:21 GMT Link


11.another chris said:

Nice Site, how about some tutes on design?

Mon, 15 Jan 2007 at 12:38:08 GMT Link


12.Sjon said:

I'd go for "/blog/omniti-is-hiring", which indeed can only be used once, and should!

Dates in the url of the blog post is in my opinion wrong, since it's part of the metadata, not the actual content. People want to know your take on csrf, not on 2007-jan.

Mon, 15 Jan 2007 at 14:57:54 GMT Link


13.Darryl Patterson said:

I agree, descriptive URLs are all good. In the case of blogs, I'd say using a date in there is relevant, as thoughts and opinions change over time. Year/month is plenty granular enough.

As for the jan/01 debate, I'd lean towards a number, as it's a little more universal across languages (01 as a month has more meaning to someone speaking Spanish than 'jan' does). Ultimately, it's a pretty trivial debate though.

Mon, 15 Jan 2007 at 16:09:02 GMT Link


14.Nate Klaiber said:

Chris,

I previously wrote about this topic as well. I consider myself very particular when it comes to URL structure. I want to be able to accommodate friendly and descriptive links - but I also want to keep them short and to the point (easy to repeat to a friend). This is why I usually allow several different structures for my URL - but make the URL friendly link the main link. My new site will house this structure better than other sites I have done.

Personally, I don't think it is any slower because I index the post slug (unique key) in the database. If you build your database structure accordingly then it will still be quick to respond.

I work for a book publisher and I use multiple URLs in several ways. I want our books to have a friendly url, but I also want our bookstores and customers to be able to find a book by an ISBN. So, I have both URLs available to them, they can type in http://www.barbourbooks.com/book/detail/{ISBN}/ which will then apply a 301 redirect to the SEF url. Having one central location is key to maintaining a nice index (as well as avoiding looking to serve up duplicate content on 2 urls). This just gives your visitors multiple doors into your website - short and sweet URLs.

I think the key is to have the proper database structure, map out the possible doors to the website, decide on a 'main' page that will be the end landing page, and then understand your HTTP codes to send the proper code with the proper request (301, 302, etc).

And, though I hate the www. as well, I have all requests routed to use the www no matter where they come in, so its not important one way or the other - this was just a piece that was used on our site for previous years (and is in our printed marketing material).

Mon, 15 Jan 2007 at 17:13:52 GMT Link


15.Dave Lists said:

I use mod_rewrite:

RewriteRule ^/(.*) http://www.me.com/$1 [R=301,L]

I notice that Google understands the 301, however Yahoo, MSN and nearly everyone else doesn't. Have you guys any experience of this?

Dave.

Tue, 16 Jan 2007 at 01:07:45 GMT Link


16.Chris Shiflett said:

Hi Nate,

I stay away from duplicate URLs, but if you're always redirecting to a unique URL, I guess you avoid most of the problems. (People won't accidentally post the wrong URL to del.icio.us.)

I can't help but be reminded of my Zend Framework tutorial:

http://phparch.com/zftut

This just redirects to the real URL:

http://hades.phparch.com/ceres/publ...ework::tutorial

If the real URL didn't suck so much, the other wouldn't be necessary. Seriously, that's one of the worst static URLs I've seen.

As for the www subdomain, I think it's just as bad as the hades subdomain in the URL above, if not worse. (The hades subdomain is php|architect's PHP 5 server, so at least they have an excuse.) But, as long as others are consistent and handle requests made to both, I can forgive them. :-)

Tue, 16 Jan 2007 at 15:37:58 GMT Link


17.Eamon Nerbonne said:

A true web developer post :-). Little things like url's bother me day and night too, except there doesn't seem to be a truly simple means of implementing these suggestions (as in one which easily works across all platforms I'll ever use).

In any case, I also think the numeric month is better and easier to use, and I'm almost positive that's what you should use - though not because of free sorting (how many people sort url's alphabetically?)

When people read a URL they're habituated to skip large parts of it because they are junk anyway. That's also why phishing attacks which simply include the domain name and don't even bother with advanced techniques to make details like slashes and dots match are still used. And that means that when I skim your URL, I'll be skipping the host name, and then start reading "jan url vanity". Is jan some girl's name?

Things which are similar should look similar, so make 2007 and jan as similar as possible - by making them both numeric.

And if you're hooked on bonuses, then sure you do get nice alphabetic sorting, and the URL will be a whole character shorter to boot...

Wed, 17 Jan 2007 at 07:33:17 GMT Link


18.Ben Ramsey said:

You should check out http://www.welldesignedurls.org/

The project lead there is a member of Atlanta PHP.

Wed, 17 Jan 2007 at 20:02:55 GMT Link


19.Derek Martin said:

A few thoughts...

Jan,Feb,Mar is not easily multi-linguified, whereas 01,02,03 is inherently multi-lingual.

I like your "/blog/2007/jan/url-vanity" idea. That's a good scheme, and I may adopt it myself if.... The IF that's always hanging about is how to modify the FrameworkOfTheDay to accommodate such fun.

Doing that with ZendFramework wouldn't be too difficult, but I've recently swung over to the CakePHP camp, and it seems like it might be impossible (or very tricky) to get such URLs working in CakePHP.

P.S. -- my current blog is a travesty, hence the lack of link.

Wed, 17 Jan 2007 at 21:47:04 GMT Link


20.Nate Klaiber said:

RE: Derek

CakePHP has some custom routing built into it, be sure to check that section out - it will allow you to map your URLs to the structure you prefer (most of the frameworks allow custom routing).

Where it fails, and so do some of the other frameworks, is if you want multiple 'doors' (urls) that point to one base (as I discussed above). So, if I wanted www.example.com/promo/ (an easier to 'speak' or 'print' url) to point to www.example.com/2007/01/new-item-offer/ (something much more machine friendly and information friendly) or something like that, then it is a little bit trickier to achieve by just using the Framework routing options. I have found this to be true in Cake, Zend, Code Igniter, and Symfony. The option you would have goes back to mod_rewrite - which is what I have found to be the best option - simply rewrite your doorway pages to your custom routed URL inside of your framework. This way you can send the proper HTTP code with the request (301, 302, etc). This will keep your URLs neat and tidy, while still giving you great flexibility.

And, I think we can all understand getting a blog up and running, I am in the middle of re-building mine from scratch as I havent been happy with it for quite some time. Hope you can get it up and running soon!

Peace,

Nate

Thu, 18 Jan 2007 at 14:58:14 GMT Link


21.Chris Shiflett said:

I'm still leaning toward jan, feb, mar, etc. You're right about the multilingual issue, Derek, but since this blog uses a particular language, it seems OK to use that language in the URLs.

Plus, things like url-vanity are already language-specific.

Sun, 21 Jan 2007 at 04:16:22 GMT Link


22.Jon Gibbins said:

I changed the structure of my URLs when I reorganised my site last year. I decided that the things I post can be made unique to the date on which they are posted, and so only used the date and a title for the URL - no section or category, like 'blog' or 'tech'.

I see additional information about an entry (like section, categories, tags) are a means of navigating to that entry. While I considered that arranging entries under a section like 'blog' could be useful in some cases, I figured that the title of entries should convey enough information as to what it is about.

And talking of dates in URLs, I mentioned months as numeric representations versus short-hand names in the notes of some recent screen reader testing:

http://dotjay.co.uk/tests/screen-re...ate-time/#notes

Sat, 27 Jan 2007 at 00:14:29 GMT Link


23.Chris Shiflett said:

Thanks for that URL, Jon. It was very insightful.

So far, only Eamon's comments are keeping me undecided. I like the fact that a numerical month will generally be skipped when someone scans the URL, and I like that I save a character.

But, that's about it. Abbreviated months seem friendlier, less ambiguous, etc.

Thu, 08 Feb 2007 at 06:15:35 GMT Link


24.streaky said:

This always drives me nuts. The best way I've personally found is to use the ID of the blog entry in there, and only rely on that, for example, I may use :

/blog/<id>/my-title-rawks

This way the ID is totally unique and I can change the title any time I want - because the ID of the entry is all the code cares about - moreover the rewrite rules I got in lighty totally ignore the title so only the id gets passed to the script, which will also always only be an int which helps elsewhere.

It also means I can reuse a title if i want.

Wed, 21 Feb 2007 at 00:34:56 GMT Link


25.pedantic said:

On the other hand, URI are opaques

http://www.w3.org/DesignIssues/Axioms.html#opaque

Sun, 25 Mar 2007 at 00:01:49 GMT Link


26.Marc said:

I'd just like to disagree with Sjon above who said "the date is metadata". I think thats wrong, and I'll tell you why.

It's absolutely something you ought to be able to scan the URL for. I find this is particularly important, also when searching for programming information. ActionScript tutorials, or something, are basically useless if they are for version 1.0 instead of 3.0. If your URL just says:

http://mysite.com/nifty-actionscript-tutorial

That's entirely useless for me, because "as tutorial" does not mean the same thing in 2001 that it means in 2006. The date isn't just metadata; it changes the meaning of the main concepts.

Thu, 28 Jun 2007 at 06:35:24 GMT Link


27.Todd Eddy said:

I'm right in the middle of overhauling my antiquated site. Lets just say the previous url structure was... detailed. I have the same idea that URLs should not expire or at least not for a long time. Right now my apache config is filled with a ton of redirect lines as I opted to just "flip the switch" even though the new site is missing a lot of content. A lot of stuff I just decided to drop because no one used it (http response 410 says a resource is gone and no alternative has been provided, "sucks to be you" basically). There's a bunch of 301 redirects, etc. I specifically left other pages alone so the give a 404 error since this means to a search engine "I think this page still exists, I just can't find it right now. Come back later" Apache web server provides the very helpful mod_alias module.

Also big on the no "www." part, just seems redundant to me. Admittedly I still haven't setup a redirect rule so it will remove the www from mine.

I'm still on the fence about adding an extension. No extension is shorter and actually recommended by W3C so if you change underlying technologies you don't have all these now-dead links (remember .php3 and .php4?) For example my old site used server side includes because I didn't know php that well at the time, so it had .shtml extensions--gives you an idea of how old this site was.

Even though when I look at it from a programmer's perspective I'd rather use numeric dates, as an end user an abbreviated month looks better. I do like actually creating dates as a lot of things are context sensitive. Take ".../blog/file-caching" for example. Caching is a very volatile topic. What was good two years ago may be horrible now. Plus the format of /blog/(year)/(month)/(title) is fairly commonplace and well understood. You could use the two url approach like a couple above mentioned. The "permalink" to an entry would be /blog/2007/jan/url-vanity but you could have a /blog/url-vanity that you could tell people and have it always direct to the most current page.

Some great examples of what doesn't work is just look at some of the URLs people have mentioned in this post. Not to pick on them, but something like http://www.nabble.com/Controller-subdirectory-organization-%28ZF-637%29-tf2941021s16154.html is a great example of a link that just doesn't work.

Ultimately, the best thing is that once you picked something, stick with it. Every time you change something you either just let the page give a 404 error (very bad) or you have to setup redirects, make sure any previous redirects also go to the new site, etc. So since I'm making a comment on a post 8 months old, my vote is for whatever it is right now :)

Wed, 08 Aug 2007 at 03:56:56 GMT Link


28.Joppe said:

Great article! I´m personally think that a numeric month is more user friendly. If the year is a number the month also should because it´s more easy to discern when you look at the URL.

shiflett.org/blog/2007/01/url-vanity

shiflett.org/blog/2007/01/23/url-vanity

is better than

shiflett.org/blog/2007/jan/url-vanity

shiflett.org/blog/2007/jan/23/url-vanity

Thu, 06 Sep 2007 at 11:30:57 GMT Link


29.Ian Storm Taylor said:

Hey Chris,

I was just wondering... is the need for naming the year and month in the URL simply to be able to post duplicate titles?

If so, I would argue that you keep the year and remove the month completely. I see problems with both displays of months. With the numeric format, I would worry about visitors not thinking of the numerals as date and just assuming they are random database numbers (something you are keen to avoid). And with the abbreviated format, like others have said it can be confusing, and it doesn't conform to the numbers used for the year (although that is the only way to denote a year so I'm not sure how it can be argued not to fit... but that's another discussion).

Instead why not remove the month altogether? If you are worried about duplicate posts, I would hope that at least you can come up with enough titles to fill all you hiring needs in one year :P And it will be just one less step for the user to take in the URL.

Now of course, if there are other reasons for having the URLs contain the date, then by all means completely disregard my entire post hehe :)

Also, I think your URL naming over at OmniTI is GENIUS. I mean MIND-BLOWING! I LOVE it! It is the perfect type of structure for portfolio/showcase sites, and I just wanted to commend you guys for implementing it.

Alright, that's all!

(Ooh one more thing I just found... how do I subcribe to follow-up comments via email? I don't see the usual box... but maybe I'm blind.)

Tue, 07 Jul 2009 at 02:30:49 GMT Link


Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.