A rev="canonical" HTTP Header

11 Apr 2009

Related: Save the Internet with rev="canonical"

Update: Recommending Link (an existing header) instead of X-Rev-Canonical. See below for syntax.

Since my post yesterday, I have noticed a lot of chatter all over the place about #revcanonical. Ben Ramsey wrote a rebuttal to the idea that argues against rev="canonical" due to the lack of an explicit indication that the reverse link is in fact shorter. This is a valid point, but there is a bigger obstacle to consider that I mentioned in my comment on his post:

Implementing rev="canonical" for sites like Dopplr and Flickr takes very little effort. Implementing it for sites like Twitter requires an HTTP request and some HTML parsing every time. That's a lot to ask, and I think it's the biggest obstacle rev="canonical" faces.

I can't imagine a site like Twitter adopting rev="canonical" for this reason. They get a lot of traffic, and they seem to struggle as it is. Why would they willingly support something that requires extra work every time someone mentions a URL?

One possible solution that at least lessens the burden is to add an HTTP header in addition to rev="canonical". This is simple to do with PHP, and I'm already supporting it on a few of my URLs (including this one):

  1. <?php
  2.  
  3. header('Link: <http://tr.im/revheader>; rev=canonical');
  4.  
  5. ?>

With this simple addition, the burden is reduced to a HEAD request, and the necessary parsing is a lot simpler as well. (Broken HTTP is less common than broken HTML.) Ed Finkler agrees:

I would far prefer an HTTP header over having to retrieve the document itself. I could more easily support that kind of thing in Spaz. I don't want to write something to parse broken HTML.

Of course, this idea still requires a bit of work, and I remain doubtful that Twitter will support it, but it's at least a lot simpler for Twitter clients to support, and it could be helpful. Plus, it's pretty easy to implement. If you like the idea, please pass it on.

Using a header isn't a complete solution, because tools like Simon Willison's bookmarklet use the HTML source and don't need to request the page. Thus, I think it's best to continue supporting rev="canonical" in addition to the Link header.

Matt Cutts mentions another concern:

If a URL A1 can claim it is the canonical URL for another URL A2 on the domain A, that opens up the possibility of hijacking attacks, especially on free hosts. That's why when my team at Google built consensus for rel="canonical"; we said that URLs could only give away canonicalness, not take it from other URLs. Splatting canonicalness forward from a URL is safe, but claiming canonicalness from other URLs opens up the possibility of attacks.

With every new idea, it's important to consider abuse. Google should never interpret rev="canonical" across domains as a means of stealing canonicalness, but I can't imagine Google ever making that mistake. In the context of the current discussion, this concern is irrelevant.

Jeremy Keith urges early adopters to look for rev="canonical" in <a> tags as well as <link> tags. Good advice.