About the Author

Chris Shiflett

Hi, I’m Chris: entrepreneur, community leader, husband, and father. I live and work in Boulder, CO.


All posts for Apr 2009

A rev="canonical" HTTP Header

Related: Save the Internet with rev="canonical"

Update: Recommending Link (an existing header) instead of X-Rev-Canonical. See below for syntax.

Since my post yesterday, I have noticed a lot of chatter all over the place about #revcanonical. Ben Ramsey wrote a rebuttal to the idea that argues against rev="canonical" due to the lack of an explicit indication that the reverse link is in fact shorter. This is a valid point, but there is a bigger obstacle to consider that I mentioned in my comment on his post:

Implementing rev="canonical" for sites like Dopplr and Flickr takes very little effort. Implementing it for sites like Twitter requires an HTTP request and some HTML parsing every time. That's a lot to ask, and I think it's the biggest obstacle rev="canonical" faces.

I can't imagine a site like Twitter adopting rev="canonical" for this reason. They get a lot of traffic, and they seem to struggle as it is. Why would they willingly support something that requires extra work every time someone mentions a URL?

One possible solution that at least lessens the burden is to add an HTTP header in addition to rev="canonical". This is simple to do with PHP, and I'm already supporting it on a few of my URLs (including this one):

<?php
 
header('Link: <http://tr.im/revheader>; rev=canonical');
 
?>

With this simple addition, the burden is reduced to a HEAD request, and the necessary parsing is a lot simpler as well. (Broken HTTP is less common than broken HTML.) Ed Finkler agrees:

I would far prefer an HTTP header over having to retrieve the document itself. I could more easily support that kind of thing in Spaz. I don't want to write something to parse broken HTML.

Of course, this idea still requires a bit of work, and I remain doubtful that Twitter will support it, but it's at least a lot simpler for Twitter clients to support, and it could be helpful. Plus, it's pretty easy to implement. If you like the idea, please pass it on.

Using a header isn't a complete solution, because tools like Simon Willison's bookmarklet use the HTML source and don't need to request the page. Thus, I think it's best to continue supporting rev="canonical" in addition to the Link header.

Matt Cutts mentions another concern:

If a URL A1 can claim it is the canonical URL for another URL A2 on the domain A, that opens up the possibility of hijacking attacks, especially on free hosts. That's why when my team at Google built consensus for rel="canonical"; we said that URLs could only give away canonicalness, not take it from other URLs. Splatting canonicalness forward from a URL is safe, but claiming canonicalness from other URLs opens up the possibility of attacks.

With every new idea, it's important to consider abuse. Google should never interpret rev="canonical" across domains as a means of stealing canonicalness, but I can't imagine Google ever making that mistake. In the context of the current discussion, this concern is irrelevant.

Jeremy Keith urges early adopters to look for rev="canonical" in <a> tags as well as <link> tags. Good advice.

Save the Internet with rev="canonical"

Related: A rev="canonical" HTTP Header

Slashdot: Note that rev="canonical" (reverse link) and rel="canonical" (forward link) indicate the same relationship in opposite directions. Also, be careful not to make the assumption that shorter URLs are always better. Obviously, I prefer the URL I'm using, but if you require a shorter one, please use http://tr.im/revcanonical. (I use rev="canonical" to indicate this preference, which is what this post is all about.) For more information about my obsession with URLs, see URL Vanity and URLs Can Be Beautiful. Thanks for reading!

There's a new proposal ("URL shortening that doesn't hurt the Internet") floating around for using rev="canonical" to help put a stop to the URL-shortening madness. It sounds like a pretty good idea, and based on some discussions on IRC this morning, I think a more thorough explanation would be helpful. I'm going to try.

The premise is pretty simple. In order to avoid the great linkrot apocalypse, we can opt to specify short URLs for our own pages, so that compliant services (adoption is still low, because the idea is pretty fresh) will use our short URLs instead of TinyURL.com (or some other third-party alternative) replacements.

This is easiest to explain with an example. I have an article about CSRF located at the following URL:

http://shiflett.org/articles/cross-site-request-forgeries

I happen to think this URL is beautiful. :-) Unfortunately, it is sure to get mangled into some garbage URL if you try to talk about it on Twitter, because it's not very short. I really hate when that happens. What can I do?

If rev="canonical" gains momentum and support, I can offer my own short URL for people who need one. Perhaps I decide the following is an acceptable alternative:

http://shiflett.org/csrf

Here are some clear advantages this URL has over any TinyURL.com replacement:

  • The URL is mine. If it goes away, it's my fault. (Ma.gnolia reminds us of the potential for data loss when relying on third parties.)
  • The URL has meaning. Both the domain (shiflett.org) and the path (csrf) are meaningful.
  • Because the URL has meaning, visitors who click the link know where they're going.
  • I can search for links to my content; they're not hidden behind an indefinite number of short URLs.

There are other advantages, but these are the few I can think of quickly.

With rev="canonical", I can indicate my preferred short URL for the canonical one. I just have to hope the idea catches on.

First, I need to make sure my short URL redirects to the canonical URL. I can do this with PHP:

<?php
 
header('Location: http://shiflett.org/articles/cross-site-request-forgeries', TRUE, 301);
 
?>

This results in a 301 (permanent) redirect, which is what I want. (Thanks to Vanessa's comment, I have learned that this is interpreted the same as rel="canonical".)

With my short URL redirecting to the canonical one, I just need to add rev="canonical" to the canonical (long) URL:

<link rev="canonical" href="http://shiflett.org/csrf" />

If Twitter adopts this, then whenever someone uses the canonical URL, Twitter will replace it with my preferred short URL instead of some TinyURL.com garbage. Wouldn't that be nice?

There is some confusion between rev="canonical" and rel="alternate shorter". The former means the current URL is the canonical equivalent of the URL in the href attribute. (Thus, it is the opposite of rel="canonical".) The latter indicates the same thing but also means the URL in the href attribute is shorter. In practice, all you really need is rev="canonical", as indicated by Dopplr's support:

<link rev="canonical" href="http://dplr.it/brooklyn" /><!-- http://revcanonical.appspot.com/ -->

There is a tool you can use to test Dopplr's implementation, test my example, or test your own.

I like to give credit where credit is due, so I asked Kellan Elliott-McCrea (@kellan) to tell us about the idea's history:

The idea emerged in conversation between myself, Les Orchard, and Kevin Marks. (Rafe Colburn suggested something similar about 2 years ago.) Niall Kennedy and Shawn Medero provided useful comments. I just documented and wrote the code.

It is already being supported by Dopplr, PHP.net, Ars Technica, and Flickr. Let's hope Twitter jumps on the bandwagon soon!

If you use Twitter and want to join the discussion, please use the #revcanonical tag. You can also follow me (@shiflett), since I'm sure to be interested in this for a while longer. :-)

CSS Naked Day

You might be wondering what happened to my design. As with years past (2007, 2008), I'm participating in CSS Naked Day to show my support for web standards, and to show off the design of shiflett.org:

The idea behind this event is to promote web standards. Plain and simple. This includes proper use of (X)HTML, semantic markup, a good hierarchy structure, and of course, a good ol' play on words. It's time to show off your <body>.

Although I haven't the time to fully explain this thought right now, you have to look beyond the surface to truly appreciate good design, so participating in CSS Naked Day does more to show off my design than to hide it.

This is what HÃ¥kon Wium Lie has to say about the event:

This is a fun idea, fully in line with the reasons for creating CSS in the first place. While most designers are attracted by the extra presentational capabilities, saving HTML from becoming a presentational language was probably a more important motivation for most people who participated in the beginning.

Is your site naked?