About the Author

Chris Shiflett

Hi, I’m Chris: web craftsman, community leader, husband, father, and partner at Fictive Kin.


A rev="canonical" HTTP Header

Related: Save the Internet with rev="canonical"

Update: Recommending Link (an existing header) instead of X-Rev-Canonical. See below for syntax.

Since my post yesterday, I have noticed a lot of chatter all over the place about #revcanonical. Ben Ramsey wrote a rebuttal to the idea that argues against rev="canonical" due to the lack of an explicit indication that the reverse link is in fact shorter. This is a valid point, but there is a bigger obstacle to consider that I mentioned in my comment on his post:

Implementing rev="canonical" for sites like Dopplr and Flickr takes very little effort. Implementing it for sites like Twitter requires an HTTP request and some HTML parsing every time. That's a lot to ask, and I think it's the biggest obstacle rev="canonical" faces.

I can't imagine a site like Twitter adopting rev="canonical" for this reason. They get a lot of traffic, and they seem to struggle as it is. Why would they willingly support something that requires extra work every time someone mentions a URL?

One possible solution that at least lessens the burden is to add an HTTP header in addition to rev="canonical". This is simple to do with PHP, and I'm already supporting it on a few of my URLs (including this one):

<?php
 
header('Link: <http://tr.im/revheader>; rev=canonical');
 
?>

With this simple addition, the burden is reduced to a HEAD request, and the necessary parsing is a lot simpler as well. (Broken HTTP is less common than broken HTML.) Ed Finkler agrees:

I would far prefer an HTTP header over having to retrieve the document itself. I could more easily support that kind of thing in Spaz. I don't want to write something to parse broken HTML.

Of course, this idea still requires a bit of work, and I remain doubtful that Twitter will support it, but it's at least a lot simpler for Twitter clients to support, and it could be helpful. Plus, it's pretty easy to implement. If you like the idea, please pass it on.

Using a header isn't a complete solution, because tools like Simon Willison's bookmarklet use the HTML source and don't need to request the page. Thus, I think it's best to continue supporting rev="canonical" in addition to the Link header.

Matt Cutts mentions another concern:

If a URL A1 can claim it is the canonical URL for another URL A2 on the domain A, that opens up the possibility of hijacking attacks, especially on free hosts. That's why when my team at Google built consensus for rel="canonical"; we said that URLs could only give away canonicalness, not take it from other URLs. Splatting canonicalness forward from a URL is safe, but claiming canonicalness from other URLs opens up the possibility of attacks.

With every new idea, it's important to consider abuse. Google should never interpret rev="canonical" across domains as a means of stealing canonicalness, but I can't imagine Google ever making that mistake. In the context of the current discussion, this concern is irrelevant.

Jeremy Keith urges early adopters to look for rev="canonical" in <a> tags as well as <link> tags. Good advice.

About this post

A rev="canonical" HTTP Header was posted on Sat, 11 Apr 2009. If you liked it, follow me on Twitter or share:

28 comments

1.Jaap said:

I don't think you can discard Matt Cutts comment as easily as that.

If, for example, a malicious person is somehow able to inject a rev=canonical element into a link worthy blog post he will be able to take control of the canonical short link. Effectively being able to take people to an unintended destination.

Another way would be to create a site full with (scraped) "good" content. If then someone is willing to create a link to one of these pages the canonical HTTP header kicks in. By having full control over the "good" site you can respond with canonical urls that have other intentions.

Both issues also relate to one if current issues with short links anyway, they make the destination opaque.

Sat, 11 Apr 2009 at 20:30:50 GMT Link


2.Jaap said:

I intended the last sentence to read "Both issues also relate to one of the current issues with short links anyway, they make the destination opaque."

Nevertheless I really like the idea of rev canonical. I can see this spreading rapidly if some of the blog publishing platforms and content management systems gain the ability to add the element and inject the response header. Either natively or by a plugin.

Sat, 11 Apr 2009 at 20:34:50 GMT Link


3.Chris Shiflett said:

Jaap, you're right about the XSS risk inherent in this idea. This is something first mentioned in Sean's comment on my previous post.

Existing XSS exploits are already much worse than this, so I'm not sure I can see the concern. I think Matt's point was that I shouldn't be able to take canonicalness by using rev="canonical" on one of my pages:

<link rev="canonical" href="http://google.com/" />

In no way should this allow me to take any SEO love from Google. Of course, I can't imagine anyone like Google ever interpreting it that way, so it's irrelevant.

Regarding plugins, there's already a WordPress one by Rob Allen that support rev="canonical" as well as the X-Rev-Canonical header:

http://akrabat.com/shorter-links/

Thanks for commenting!

Sat, 11 Apr 2009 at 20:44:18 GMT Link


4.Chris Shiflett said:

I wanted to share a comment from Andy Mabbett:

The solution to the problem Matt Cuts highlights is to only accept rel="shortcut" (or, for that matter, rev="canonical") if a reciprocal rel="canonical" is also in place (as with rel="me"). In which case, passing that relationship across domains should also be acceptable.

As a solution to Google's problem, this makes sense, but I don't really see the point. Google already respects rel="canonical", and it doesn't matter if there is a reciprocal rev="canonical" at the other URL.

In other words, this effectively changes nothing in Google's behavior, and it further supports my claim that Matt's concern is irrelevant in this case.

Sat, 11 Apr 2009 at 23:31:53 GMT Link


5.Stephen Paul Weber said:

The Link: header is meant for exactly this sort of use :)

Sun, 12 Apr 2009 at 00:03:33 GMT Link


6.Clint Ecker said:

Hey everyone. I just started inserting X-Rev-Canonical headers on articles @ Ars Technica. Currently they'll only be in place for new-ish articles (stuff published in the past day or two), but will be in place on all new stuff.

Once I get around to doing a full rebuild of the site (we publish static files) they'll be in place for every article we've ever published.

Here's some examples:

http://arstechnica.com/apple/news/2...phone-os-30.ars

http://arstechnica.com/gaming/news/...4/wolverine.ars

http://arstechnica.com/microsoft/ne...s-want-macs.ars

Sun, 12 Apr 2009 at 00:19:07 GMT Link


7.Clint Ecker said:

@Stephen

Would it operate in this manner?

Link: <http://arst.ch/8b>; REL=short_url

Sun, 12 Apr 2009 at 00:20:40 GMT Link


8.Chris Shiflett said:

Very cool, Clint. I'm happy to see Ars Technica helping to promote the idea.

Sun, 12 Apr 2009 at 01:20:55 GMT Link


9.Chris Shiflett said:

Clint's post about the X-Rev-Canonical support:

http://blog.clintecker.com/post/952...on-ars-technica

Sun, 12 Apr 2009 at 01:38:04 GMT Link


10.Chris Shiflett said:

Regarding Link, I believe the proper syntax would be as follows:

Link: <http://arst.ch/8b>; rev=canonical

I didn't realize Link was being brought back from the dead, else I would have suggested this syntax instead. (Thanks for the tip, Stephen.)

This would also allow rel links to be communicated via HTTP headers in cases where the URL is redirecting and not sending any content.

Thoughts?

Sun, 12 Apr 2009 at 03:22:29 GMT Link


11.Ben Ramsey said:

Yep. Looks like Link is still in the Internet-Draft stage, but it's current, which is a good thing.

My preference would to see Link used for this rather than introducing X-Rev-Canonical. The Link header is extensible, provides the full benefit of HTML link tags in an HTTP header, and is already well-defined.

http://tools.ietf.org/html/draft-no...-link-header-04

Sun, 12 Apr 2009 at 04:57:10 GMT Link


12.Ben Ramsey said:

I meant to include this in my previous comment, but following your example, you could also do something like the following:

Link: <http://brtny.me/382>; rel="alternate shorter"; title="Short URL for Post"

While reading the RFC for Link, I found it interesting to note that it states:

Applications that don't merit a registered relation type may use an extension relation type. An extension relation type is a URI that, when dereferenced, SHOULD yield a document describing that relation type.

So, if using the Link header with either rel or rev, until either "canonical" or "shorter" are accepted as registered IANA relation types, then, according to spec, the Link header syntax should be something like the following:

Link: <http://arst.ch/8b>; rev="http://revcanonical.appspot.com/#rev-canonical"

Sun, 12 Apr 2009 at 05:12:43 GMT Link


13.Rob Allen said:

Of course, without a rel parameter, it is unclear what the purpose of the Link is, unless everyone uses the same URL in the rev section. i.e. I could use:

Link: <http://akrabat.com/zft>; rev="http://akrabat.com/shorter-links"

but who would know what that link was meant for?

Arguably,

Link: <http://akrabat.com/zft>; rel="alternate"; rev="http://akrabat.com/shorter-links"

would give clients a hint that this link is another view of the same document. Of course, the client would have to infer that the lack of type, lang and media parameters meant that it was another link to the same document.

Clearly it's simpler if "shorter" is registered :)

The advantage of an X- header is that it's not out-of-spec, but, like Ben, I'm not a fan of rev="canonical", so would have preferred X-Link-Shorter myself.

As an aside, Simon Willison's bookmarklet is brilliant!

Regards,

Rob...

Sun, 12 Apr 2009 at 08:28:18 GMT Link


14.Simon Reinhardt said:

Note that the current draft for the Link header "unspecifies" the rev parameter. It is allowed but has no meaning really. You can still fake its effect though by setting the link value to the requested URI and the anchor parameter to the short URI.

Sun, 12 Apr 2009 at 18:06:12 GMT Link


15.Richard said:

Okay, I'm kind of confused at the moment.

Chris, in your original article, the canonical link is the shortened one. In the arstech ones it's the full link. They then have a "alternate short_url" for the shorter one. Which is correct?

Also doesn't it not matter much for the twitter end clients to use canonical because twitter will have the shortened URL already in it and you would use other ways to get the long URL if desired?

Sun, 12 Apr 2009 at 19:03:59 GMT Link


16.Dave Marshall said:

I much prefer the idea of using a header and I don't mind what it is as long as everyone decides on one!

Took me five minutes to change my url shortener to Curl a HEAD request and check for X-Rev-Canonical in preference to generating a link itself.

http://lnkd.in

Sun, 12 Apr 2009 at 21:40:21 GMT Link


17.Simon Reinhardt said:

Well, since the Link header has been part of HTTP 1.0 and is supposed to be the equivalent of <link> in HTML it is the obvious solution for this - no need coming up with a new header. :-)

Sun, 12 Apr 2009 at 21:45:00 GMT Link


18.Chris Shiflett said:

Dave, very cool. :-)

I think Link might be more appropriate and will update this post once I determine the best syntax. (I'm not yet convinced of Ben's suggested syntax, because I haven't spent enough time researching.)

Sun, 12 Apr 2009 at 21:46:42 GMT Link


19.phlo said:

If this is only meant for space-constrained services like twitter, why not just stick to

tinyurl.com/asdf?

The URL is shortened for transfer to space-constrained devices like cell phones, people can easily transcribe it from cell or twitter to address bars (short-term usage) yet there won't be any linkrot because User-Agents won't give a shit about the tinyurl.com/asdf part; treating it as the people-friendly text it is.

Sun, 12 Apr 2009 at 23:44:42 GMT Link


20.Chris Shiflett said:

After discussing this with several people, I have updated the post to recommend Link:

<?php
 
header('Link: <http://tr.im/revheader>; rev=canonical');
 
?>

What appears to be a disagreement about the proper syntax is due to a recommendation to use an intermediate syntax until canonical is deemed official in some way, but I agree with Ben's view on this:

Since Google introduced the canonical relation type with the rel attribute, then there is an authority on what canonical means as a rel/rev type, so there's already an understood definition for the type, and there's no need for the URI.

Therefore, I'm supporting Link exactly as I've used it in this post (and in this comment).

I hope you do, too.

Mon, 13 Apr 2009 at 04:45:57 GMT Link


21.Leslie Michael Orchard said:

Just to throw this into the mix here - my only minus against the Link: header is that it doesn't help where I can host static HTML, but not set custom headers. Such environments exist, particularly where blogs are hosted.

So, I think there's still a need for an in-page hint.

Mon, 13 Apr 2009 at 15:16:32 GMT Link


22.Clint Ecker said:

I'm updating Ars to use the new Link header. We also specify the hint in the header <link> element too for both cases.

Mon, 13 Apr 2009 at 15:51:13 GMT Link


23.Chris Shiflett said:

Leslie, a bug in my blog caused your name to be lost. I found it on your site; hope it's right. I'll try to figure out what happened and fix it.

I'm not suggesting people support the Link header instead of rev="canonical". I'm just suggesting people use both when possible. For static resources, it's not as easy, so I can understand people not bothering. For blogs, it's simple, and I think Rob's plugin already supports it for WordPress.

Clint, that's great. :-)

Mon, 13 Apr 2009 at 16:02:31 GMT Link


24.Ben Ramsey said:

Leslie, while I agree that there needs to be a hint in the mark-up because this will give user agents the ability to present the short URL in a special way to the user (if desired), most of the popular blogging software are written in either PHP, Python, or Ruby, and each of these languages has the ability to set HTTP headers at the language level, so your host most likely has support for setting custom headers.

Mon, 13 Apr 2009 at 16:21:20 GMT Link


25.Shashi said:

Our application http://junta.in uses the recommendations in this post to map short URLs.

Sat, 18 Apr 2009 at 17:05:04 GMT Link


26.Andy Mabbett said:

@Chris Shiflett, #4, belatedly:

Google only accepts rel=canonical within the same domain. My solution addresses the problem (and Matt's concern) of determining veracity across domains.

Sat, 27 Jun 2009 at 22:43:07 GMT Link


27.Ronald said:

A little hard for a rookie like me, but useful. I also thought you'd like to know there is a great domain name at Godaddy.com that you may be interested in. It's call PHPDEVELOPING.COM and I think its a good fit for you because your a great PHP programmer. You can contact me at my email address and I'll help you get to it if you want. Again, just thought you'd like to know.

Thu, 02 Jul 2009 at 01:55:07 GMT Link


28.Stock Market Today said:

Understanding how sessions work is your best tool when it's time to debug a problem.

192.168.l.l

Sun, 12 Jun 2016 at 08:54:43 GMT Link


Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.