About the Author

Chris Shiflett

Hi, I’m Chris: entrepreneur, community leader, husband, and father. I live and work in Boulder, CO.


Save the Internet with rev="canonical"

Related: A rev="canonical" HTTP Header

Slashdot: Note that rev="canonical" (reverse link) and rel="canonical" (forward link) indicate the same relationship in opposite directions. Also, be careful not to make the assumption that shorter URLs are always better. Obviously, I prefer the URL I'm using, but if you require a shorter one, please use http://tr.im/revcanonical. (I use rev="canonical" to indicate this preference, which is what this post is all about.) For more information about my obsession with URLs, see URL Vanity and URLs Can Be Beautiful. Thanks for reading!

There's a new proposal ("URL shortening that doesn't hurt the Internet") floating around for using rev="canonical" to help put a stop to the URL-shortening madness. It sounds like a pretty good idea, and based on some discussions on IRC this morning, I think a more thorough explanation would be helpful. I'm going to try.

The premise is pretty simple. In order to avoid the great linkrot apocalypse, we can opt to specify short URLs for our own pages, so that compliant services (adoption is still low, because the idea is pretty fresh) will use our short URLs instead of TinyURL.com (or some other third-party alternative) replacements.

This is easiest to explain with an example. I have an article about CSRF located at the following URL:

http://shiflett.org/articles/cross-site-request-forgeries

I happen to think this URL is beautiful. :-) Unfortunately, it is sure to get mangled into some garbage URL if you try to talk about it on Twitter, because it's not very short. I really hate when that happens. What can I do?

If rev="canonical" gains momentum and support, I can offer my own short URL for people who need one. Perhaps I decide the following is an acceptable alternative:

http://shiflett.org/csrf

Here are some clear advantages this URL has over any TinyURL.com replacement:

  • The URL is mine. If it goes away, it's my fault. (Ma.gnolia reminds us of the potential for data loss when relying on third parties.)
  • The URL has meaning. Both the domain (shiflett.org) and the path (csrf) are meaningful.
  • Because the URL has meaning, visitors who click the link know where they're going.
  • I can search for links to my content; they're not hidden behind an indefinite number of short URLs.

There are other advantages, but these are the few I can think of quickly.

With rev="canonical", I can indicate my preferred short URL for the canonical one. I just have to hope the idea catches on.

First, I need to make sure my short URL redirects to the canonical URL. I can do this with PHP:

<?php
 
header('Location: http://shiflett.org/articles/cross-site-request-forgeries', TRUE, 301);
 
?>

This results in a 301 (permanent) redirect, which is what I want. (Thanks to Vanessa's comment, I have learned that this is interpreted the same as rel="canonical".)

With my short URL redirecting to the canonical one, I just need to add rev="canonical" to the canonical (long) URL:

<link rev="canonical" href="http://shiflett.org/csrf" />

If Twitter adopts this, then whenever someone uses the canonical URL, Twitter will replace it with my preferred short URL instead of some TinyURL.com garbage. Wouldn't that be nice?

There is some confusion between rev="canonical" and rel="alternate shorter". The former means the current URL is the canonical equivalent of the URL in the href attribute. (Thus, it is the opposite of rel="canonical".) The latter indicates the same thing but also means the URL in the href attribute is shorter. In practice, all you really need is rev="canonical", as indicated by Dopplr's support:

<link rev="canonical" href="http://dplr.it/brooklyn" /><!-- http://revcanonical.appspot.com/ -->

There is a tool you can use to test Dopplr's implementation, test my example, or test your own.

I like to give credit where credit is due, so I asked Kellan Elliott-McCrea (@kellan) to tell us about the idea's history:

The idea emerged in conversation between myself, Les Orchard, and Kevin Marks. (Rafe Colburn suggested something similar about 2 years ago.) Niall Kennedy and Shawn Medero provided useful comments. I just documented and wrote the code.

It is already being supported by Dopplr, PHP.net, Ars Technica, and Flickr. Let's hope Twitter jumps on the bandwagon soon!

If you use Twitter and want to join the discussion, please use the #revcanonical tag. You can also follow me (@shiflett), since I'm sure to be interested in this for a while longer. :-)

About this post

Save the Internet with rev="canonical" was posted on Fri, 10 Apr 2009. If you liked it, follow me on Twitter or share:

66 comments

1.Eli White said:

Your explanation misses a few, I think important, pieces.

1) You say that rel="alternate shorter" means the 'same thing' as rev="canonical" ... not 'quite' true by the way that the HTML standard works. The 'rel' version here, just means that "Hey here is another URL that happens to be a shorter version for this page. The rev="canonical" however, is saying that the specific URL you happen to be on, *IS* the canonical URL, and that the other URL you specify just happens to be another version of it. (Which the proposal is assuming it will be a shorter one)

2) The rev="" nomenclature is deprecated in HTML 5, and therefore if we are adding something 'new', we should really avoid it.

But really:

3) The statement rev="canonical" can really ONLY exist on a webpage served under the true canonical URL of a page. If you are on another URL that happens to bring up the same page, then you cannot/shouldnot include it. Afterall, that tag is claiming that the URL you are at, IS the canonical one.

But what this means, is that it's less useful. Especially in cases, such as for example php.net. The canonical URL of something on php.net might look like: http://php.net/manual/en/security.php ... However, you will never actually be served HTML under that canonical URL, instead you are auto-redirected to a mirror, such as: http://us3.php.net/manual/en/security.php ...

So since you never serve HTML under the real canonical URL, you never have a proper location to put a rev="canonical". You wouldn't put that on the us3.php.net page, because that hostname is certainly not the canonical one.

To that end, the rel="alternate shorter" is a much better (IMO) standard. Because that standard doesn't make a 'claim' about canonical-ness at all. It just states in the simplest form: "Hey, want a shortened URL for this page? Use this one" Which in this case, is http://php.net/security

So, you HAVE to have the rel="alternate shorter" syntax anyway, to allow for showing a shortened URL when not ON the canonical URL in the first place.

So why bother having the rev="canonical" in the first place? It's less generically useful, you can always just use the rel="alternate shorter" version anyway ... plus it's not HTML 5 compliant anyway.

Fri, 10 Apr 2009 at 18:15:46 GMT Link


2.Jordi Adame said:

this is a great idea! not only for blog/site owners. I'm not a URL junkie like you :), but when I click a link on twitter I hate not knowing where that URL is taking me.

It's all about the meaning!

Fri, 10 Apr 2009 at 18:17:19 GMT Link


3.Brian Moon said:

A simple 301 redirect would achieve the same thing. If you can issue a header() function in PHP, you can send a 301. I think the canonical stuff was added to Google and others for people that can't send headers from their content due to their limited architecture or even in the case of static content.

<?php
 
    header('HTTP/1.x 301 Moved Permanently');
 
    header('Status: 301 Moved Permanently');
 
    header('Location: http://shiflett.org/articles/cross-site-request-forgeries');
 
?>

Fri, 10 Apr 2009 at 18:21:38 GMT Link


4.Brian Moon said:

Eli, according to Google's blog post about rel="canonical" you have it backwards. Now, it could be them, but the 800lb. gorilla will dictate usage.

From http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html:

Now, you can simply add this <link> tag to specify your preferred version:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />

inside the <head> section of the duplicate content URLs:

http://www.example.com/product.php?...ory=gummy-candy

http://www.example.com/product.php?...;sessionid=5678

and Google will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. Additional URL properties, like PageRank and related signals, are transferred as well.

Fri, 10 Apr 2009 at 18:26:18 GMT Link


5.Chris Shiflett said:

Brian, a more direct way to do this is as follows:

<?php
 
header('Location: http://shiflett.org/', TRUE, 301);
 
?>

I prefer the temporary redirect, because my view is that the short URL is not deprecated in any way. The canonical URL is my preference, but I'm advocating the use of the former when a short URL is required.

When I use a 301, I believe I am communicating that the original URL should not be used in the future, just as the spec suggests.

Hope my reasoning makes sense.

Fri, 10 Apr 2009 at 18:31:17 GMT Link


6.Chris Shiflett said:

Some people have asked about those with long domains. Dopplr's solution is one possibility:

<link rev="canonical" href="http://dplr.it/brooklyn" />

By using dplr.it instead of dopplr.com, they save a few characters.

Sean also pointed out the fact that you can also at least indicate your preferred short URL, even if it's not your own. This isn't quite as good, because you rely on a third party, but it's still better than nothing.

Fri, 10 Apr 2009 at 18:36:16 GMT Link


7.Brian Moon said:

Heh, see what happens when you have been coding in PHP for 12 years? You don't know when some obscure feature is added to an existing function. I really would not expect the HTTP response code to be set by a third optional parameter of the header() function. That sounds like someone thinks the only use for the header function is the Location header. How do I set the status code to 404 for example? I don't need to send a header for that. Man, what a bad feature to stick on to the end of the header() function.

Fri, 10 Apr 2009 at 18:42:32 GMT Link


8.Chris Shiflett said:

Brian, you can blame me for the idea. I think Derick is to blame for the implementation.

My reasoning was that PHP already decides to change the response status code (to 302) when you add a Location header to the response. Since some people prefer a 301 in those cases, it makes sense to let them indicate the preferred response status code.

I don't think it has any affect when sending any other header.

Fri, 10 Apr 2009 at 18:49:52 GMT Link


9.Sean Coates said:

Good post. I like this idea, but I think it needs some... shall we say "maturation" before it should be adopted globally. If only we had a system that would serve as a Request For Comments on Internet issues... (-;

Anyway, one thought that came to mind is potential hijacking with sites that are vulnerable to XSS. If I were to inject the <link ... /> into another vulnerable site via XSS, then that site's shorteners would point at the rogue site. Consider: http://example.com/xssvulnerablepage?inject=%3Clink%20rev%3D%27canonical%27%20url%3D%27http%3A%2F%2Fevil.example.org%2F%27%2F%3E

This would be especially bad on sites with persistent XSS vulnerabilities.

S

Fri, 10 Apr 2009 at 18:53:38 GMT Link


10.Brian Moon said:

Sean, Google stated in their blog post that they would be ignoring rel="canonical" for domains other than the current domain and that a 301 should be used to move a URL to another domain. I think that is a good rule for search engines for sure. Maybe for others.

Fri, 10 Apr 2009 at 19:02:18 GMT Link


11.Stan Vassilev said:

Interesting idea, syntax/technical issues aside, I think we have some systematic problems:

- People with long domains will end up with short urls that are long, so people will keep using third party short url services.

- You mentioned elsewhere Dopplr suggests we buy a second short domain for this purpose. Most people won't enjoy buying and paying their domains twice, while third party shorteners like TinyURL are free.

- Putting redirection and the meta information in place may be a technical hurdle for some people who don't use pre-made platforms like Wordpress. Third party services require no knowledge.

- If we all setup our sites with short url meta info and redirection, we still need the cooperation of other parties to put it in front of people, so they use it. 1) One way is browser integration. However, this is a chicken-egg problem, as browser vendors try to keep their UI clean and so won't make this immediately visible 2) Or we can hope services like TinyURL to respect the meta information, instead of returning their own URL, and thus asking them to assist in their undoing.

- When used in informal messages like email, chat, IRC, IM, SMS, short url-s are use din short term, and so don't attribute to a future link rot problem that would harm the web significantly.

- It is with certain new services like Twitter, when this problem occurs, because of a combination of unique features (limited message length, and persistence at the same time). So it's ultimately up to Twitter to expand those url-s and store them, so they can use it to recover information if a third party service goes down.

In the end, if people wanted to go through integrating url redirection in their sites, to protect against content/link rot, they would also most likely not use services like TinyPic and Flickr to post (and link) to their pictures on the web.

Also, notice, if you go on Twitter Search, there is already a small [expand] next to each short url, so they may already be caching this information on their servers, as a form of optimization.

Still, articles like yours are helping a lot in bringing the problem forward to Twitter and sites like it, so thanks for spreading knowledge on the issue.

Fri, 10 Apr 2009 at 19:36:04 GMT Link


12.Chris Shiflett said:

Kellan has a post that gives more information about this idea's history:

http://laughingmeme.org/2009/04/03/...tening-hinting/

Fri, 10 Apr 2009 at 20:19:33 GMT Link


14.Ben Ramsey said:

I have a rebuttal on the usage of rev="canonical" that I've posted on my own blog here: http://benramsey.com/archives/a-revcanonical-rebuttal/

Sat, 11 Apr 2009 at 04:27:30 GMT Link


15.Vanessa Fox said:

As I replied to you this morning on Twitter, it won't work to do a 302 and use rel=canonical.

http://twitter.com/foundconf/status/1491109155

And as Danny Sullivan noted on Twitter, a 301 would work fine for your purposes.

http://twitter.com/dannysullivan/status/1495640489

If you're redirecting the short URL anyway, just do a 301 rather than a 302 and you won't need rel=canonical, as Google treats rel=canonical in a way very similar to a 301. In fact, that's why doing a 302 *and* rel=canonical isn't a good idea -- the search engines treat 302s one way (as temp redirects) and 301s/rel=canonical another way (perm redirect), so if you do a 302 and use rel=canonical, you're giving the search engines conflicting data.

The other advantage to a 301 vs. rel=canonical is that while all the search engines support rel=canonical, only Google has fully implemented it at this point.

(As noted by earlier comments, rel=canonical doesn't work across domains.)

I wrote about rel=canonical when it launched here:

http://searchengineland.com/canonical-tag-16537

(I also worked on this project when I worked at Google.)

Danny wrote an article about the pros and cons of each URL shortening service, particularly as they relate to SEO, although I realize you have a different point behind why you don't want to use them.

http://searchengineland.com/analysi...d-you-use-17204

Sat, 11 Apr 2009 at 05:59:41 GMT Link


16.Marcel Esser said:

1) There are already millions of tinyurls and equivs out there; this won't protected us from them. So, the 'linkrot apocalypse' will still happen.

2) If you use a third-party service such as tinyurl.com, you should expect that the life of this service is limited to the life of the company. tinyurl.com/xyzxyzxyz isn't a resource, it's a service that redirects to another location. If you use it as a thing of permanence, that is your own error.

3) If you use a service that uses another services, such as Twitter uses tinyurl.com, you should expect that risk to double, just as having 2 engines in an airplane makes you twice as prone to an engine failure. Therefore, it's either poor consideration by the end-user for choosing that service, or poor consideration by the service that stacks upon the other.

4) Retrieving meta-data about short/canonical/alternative/etc attributes of the resource, if you are embedding it into the response, requires grabbing that resource and analyzing it. This is significantly more computational effort. Additionally, as as has been pointed out in other places, including by Sean Coates, this can be a serious issue with XSS problems as well.

5) There already is an alternate identifier for URLs, especially in the scope of some visual token that is supposed to be shorter and fit into 140 character 'tweets'. That identifier is the value of the anchor tag. 'Link #1' makes just as much sense to me as 'tinyurl.com/xyzxyzxyz' - namely, none. The big difference is that I can easily hover over the former and see what resource it's actually pointing to.

6) Any system that suggests greater/lesser weight of how 'canonical' a URL is, should be taken apart by a small army of deeply involved security professionals before we introduce yet another massive vulnerability to an already broken internet.

7) The other major reasons for wanting shorter urls basically break down to a) clients that implement existing standards poorly (i.e. e-mail clients that break URLs), or b) URLs that can be communicated in non-written communications more easily. The former is a total cop-out to let people get away with not fixing their code. The latter might be a point, but the practice of creating easy-to-speak URLs is not the same as creating shorter URLs.

8) The entire drive for longer URLs is largely due to Google, and possibly some other search engines, stating that longer URLs are more indicative of their content, and thereby somehow get you bonus points. While that is probably valid practically, it's something that needs to desperately go away. There is simply no point to preserving it in the modern web other than the fact that it's become accepted practice.

Sat, 11 Apr 2009 at 06:28:42 GMT Link


17.Chris Shiflett said:

Nice post, Ben. I'll reproduce the meat of my comment here:

Implementing rev="canonical" for sites like Dopplr and Flickr takes very little effort. Implementing it for sites like Twitter requires an HTTP request and some HTML parsing every time. That's a lot to ask, and I think it's the biggest obstacle rev="canonical" faces.

Thanks for the detailed explanation, Vanessa. I'm now convinced that 301 is more appropriate, and I've updated the post (and my example) to reflect that.

Sat, 11 Apr 2009 at 14:48:59 GMT Link


18.till said:

Maybe they could use a HEAD-request to return the URL -- vs. parsing the HTML.

Sat, 11 Apr 2009 at 15:04:57 GMT Link


19.Chris Shiflett said:

A HEAD request would be a good idea, till, but it won't help in this case.

The scenario is that someone wants to link to my CSRF article, so they use the canonical URL in their Twitter update. If Twitter wants to determine whether I have a preferred short URL, they have to search the HTML (content, not the headers) for the following:

<link rev="canonical" href="http://shiflett.org/csrf" />

There are still some shortcuts that can make this easier, but it's taxing enough to matter on a high-traffic site.

Sat, 11 Apr 2009 at 15:14:37 GMT Link


20.Chris Shiflett said:

Another relevant post:

http://adactio.com/journal/1566/

Sat, 11 Apr 2009 at 16:28:06 GMT Link


21.Chris Shiflett said:

A rev="canonical" bookmarklet from Simon:

http://simonwillison.net/2009/Apr/11/revcanonical/

Sat, 11 Apr 2009 at 17:04:31 GMT Link


22.Chris Shiflett said:

Jeremy Keith urges early adopters to look for rev="canonical" in <a> tags as well as <link> tags:

http://adactio.com/journal/1568/

Sat, 11 Apr 2009 at 18:05:27 GMT Link


23.David Dollar said:

If you are so determined to avoid having your URL pasted into link-shortening services, why not just use the shorter URL as the "canonical" URL to begin with and avoid having to muck around with HTML specs.

If http://shiflett.org/csrf is *the* URL to your content, you avoid this whole canonical nonsense, you avoid people having to follow the link in order to figure out the "short" version, you avoid the problem entirely.

Sat, 11 Apr 2009 at 22:41:34 GMT Link


24.Chris Shiflett said:

Fair question, David.

In this particular case, I strongly prefer the long URL:

http://shiflett.org/articles/cross-...quest-forgeries

Although I could replace cross-site-request-forgeries with csrf to help a little, it is inconsistent with my the titles of my other articles.

The reason to leave articles in the URL has to do with information architecture. I don't want all my URLs living within /, because I would have to make naming sacrifices, and because it would lead to disorganization.

It may seem like a small matter, but consider the following alternative to the current URL (which doesn't work):

http://shiflett.org/revcanonical

That's 32 characters, so it might not even be short enough, but it also means every other post about rev="canonical" would need to use a different slug, and I'd quickly pollute / and always be making naming sacrifices to compensate.

Sat, 11 Apr 2009 at 22:52:34 GMT Link


25.David Dollar said:

Doesn't that problem exist anyway if you have to create "shortened" versions of all of your long URLs?

Sat, 11 Apr 2009 at 22:55:25 GMT Link


26.Chris Shiflett said:

Yes, which is why a lot of people supporting this idea use identifiers rather than slugs. It's not ideal, but as an alternative to some TinyURL.com replacement, I'm sure Simon prefers this:

http://swtiny.eu/EZa

Consider also PHP Advent. As a URL, it's nice to have the full title and author:

http://phpadvent.org/2008/php-witho...p-by-terry-chay

I'll probably add support for rev="canonical" (including the header) at some point (unless Sean or Jon beat me to it), and a good short URL for this is:

http://phpadvent.org/0824

It's almost meaningful enough to use as the canonical URL, but not quite. I prefer the current one.

Sat, 11 Apr 2009 at 23:08:30 GMT Link


27.David Dollar said:

Since this really seems to be a Twitter-specific problem (do any email clients still suffer from URL breakage?) I think the best solution would be to have Twitter auto-shorten long URLs using a shortening service they provide.

That way, the shortened URLs are no more volatile than the medium in which they travel. Having the solution be implemented in only one place also has great benefits, rather than expecting every CMS and custom code out there powering a website to support a new <link> standard.

It seems like the rev="canonical" won't be useful until *every* site is using it, because if you don't know, you just default to playing it safe and using a shortening service. That's setting quite a tall bar for adoption, and seems likely to fail.

I hope these comments don't come across as negative. I understand the problem exists, I just don't think this particular solution is viable.

Sat, 11 Apr 2009 at 23:08:36 GMT Link


28.Chris Shiflett said:

I hope these comments don't come across as negative.

Not at all. I appreciate the reality check.

The solution doesn't have to be absolute or ubiquitous in order to be helpful. If I'm about to link to Simon's post on the bookmarklet he wrote, I'd like to know to use his preferred URL:

http://swtiny.eu/EZa

Even if Twitter doesn't support it, I can use his bookmarklet to quickly and conveniently figure it out, and some Twitter clients like Spaz are already planning to add support. (Possibly only supporting the header.)

Even in my case, where I don't have a good URL-shortening solution at shiflett.org, I can at least suggest my preferred short URL via rev="canonical" and Link:

http://tr.im/revcanonical

I don't think anyone is expecting this to be an extremely widespread solution, but it does have a lot of positive momentum right now. Plus, it's a good idea, and I like spreading the news about good ideas. :-)

Sat, 11 Apr 2009 at 23:16:20 GMT Link


29.Jonathan Stark said:

Hi all -

Great post and great comments. I was glad to see DD's comments because I was thinking precisely the same thing. I recognize that the problem exists, but think it's something that should be addressed by Twitter (or maybe SMS), not every publisher worldwide.

That said, I don't know how Twitter could do anything about it if they want to continue to support updates via SMS. I guess that the idea of "pre-expanding" urls from known shortening services would be a good start. At least then the middleman is removed as a point of failure, etc...

Best,

j

Sun, 12 Apr 2009 at 21:04:23 GMT Link


30.Ilia Alshanetsky said:

I believe other people have mentioned it before in various rev="canonical" articles, but I'll revisit the matter. In most cases URLs are generated automatically, it would be a trivial matter to simply crc32 the URL and make a hex value (8 chars long) representing it. When generating the link this is what would be used rather then this-is-my-really-verbose-title-for-something-or-rather. You could apply the same logic around any built-in local links via a regular expression.

As far as the rev="canonical" itself, I think it would be much better implemented via an HTTP header such as "X-Rev-Canonical" you've suggested as it would mean it could be obtained via a simple HEAD request without the need to having to parse the HTML code to extract it.

Sun, 12 Apr 2009 at 21:07:02 GMT Link


31.Chris Shiflett said:

Jonathan, I think Twitter could do a few helpful things, even if they never support rev="canonical":

1. Don't mess with my URLs if I'm within the 140-character limit. The fact that they do is a major source of annoyance for many, myself included. In fact, if they stop doing this, they'd never need to support rev="canonical", because they'd never mangle a URL.

2. If they really feel the need to mangle URLs, they could store the original and only use the short URL when required (SMS). Even if their site used the short URL as link text and the real URL as the href, it would be a step in the right direction; we could at least hover over links to see where they go.

Ilia, I agree about the header, which is why I proposed it. Any early adopters considering it should be prepared to make changes, because it looks like Link is more appropriate. I'll update the post once I determine the best syntax.

Sun, 12 Apr 2009 at 21:16:55 GMT Link


32.Rob Allen said:

(Blowing my own trumpted here a little, sorry Chris!)

My Shorter Links WordPress plugin creates HTTP headers (X-Rev-Canonical and Link) along with the <link> element. An example can be seen at http://akrabat.com/sl

Regards,

Rob...

Sun, 12 Apr 2009 at 21:39:58 GMT Link


33.Alan Hogan said:

Brian, I think you mis-understood Eli's point, which makes complete and compelling sense to me. How can something have an inverse-canonical relationship to the current page if the current page is not canonical?

Sun, 12 Apr 2009 at 22:29:16 GMT Link


34.rrhe said:

But what about icanhascheezburger.com? their domain is already huge

Mon, 13 Apr 2009 at 01:04:11 GMT Link


35.pbhj said:

@Alan, I didn't get it at first but I think I do now: the rev=canonical does go on the canonical (long URL) page and the href attribute is the short URL - "this page is canonical and this resource links here".

The short URL is 301 redirected (eg by sending a Location header with PHP). There is no actual resource at the short URL just a redirect.

I think the idea is that you use the short URL on Twitter to avoid it being shortened. Fair enough. That it redirects to the long URL. OK. And that Twitter should parse this new page and look for a rev=canonical attribute so they can provide a transparent shortened URL giving a full URL, eg onmouseover, as an option on Tweets. Err ...

What I don't now get is

a) why twitter would use all those resources downloading and parsing millions of pages that don't even have the relevant link element? Where's the ROI?

b) why if they wanted to they wouldn't just follow links and get the HEAD and see if it redirects?

It seems (b) would be easier on resources and give the same effect from what I can tell. The only benefit of (a) then is having a parseable short URL for the linkerati to use to get to your resource.

The benefit of that over using external URL shorteners? Some boost in domain authority maybe?

Am I right?

2 other quick issues - 1) what if there are two resources noted as referring back to this resource? Which is considered the canonical shortened canonical link? 2) what if the shortened link is not short enough what does Twitter do now?

Mon, 13 Apr 2009 at 01:14:40 GMT Link


36.Robert Kosara said:

Why not make this a header field? Here are two reasons why I think that would be a good idea:

a) It's quicker to access and easier to parse. That makes it more likely that Twitter and a variety of Twitter clients will implement it.

b) It also works for non-HTML content. What if I want to link to an image, a PDF file, or an RSS feed? This proposal should work for anything that can be linked to.

Mon, 13 Apr 2009 at 01:46:29 GMT Link


37.Robert Kosara said:

Ah, never mind, I just saw that you already have a posting about this. Great!

Mon, 13 Apr 2009 at 01:53:47 GMT Link


38.Chris Shiflett said:

pbhj, you wrote:

why if they wanted to they wouldn't just follow links and get the HEAD and see if it redirects?

The page you're currently reading indicates a preferred short URL (http://tr.im/revcanonical), but note you're not being redirected there. No one can know about the preferred short URL if I don't indicate it in some way. This is what rev="canonical" is all about.

Robert, see here:

http://shiflett.org/blog/2009/apr/a...cal-http-header

Mon, 13 Apr 2009 at 01:57:32 GMT Link


39.Subbu Allamaraju said:

I am missing your point. If you are able to come up with a shorter URI for your resources, why not just use that shorter URI in the first place?

Mon, 13 Apr 2009 at 03:43:38 GMT Link


40.Chris Shiflett said:

Subbu, the obvious answer in this case is that I prefer the current URL:

http://shiflett.org/blog/2009/apr/s...h-rev-canonical

Did you read my post?

If this is too long for your particular use case (Twitter, email, etc.), I'm indicating a preference for the following:

http://tr.im/revcanonical

Even if I ignore the fact that I don't own tr.im, I would have to assume that shorter is always better for your question to make any sense at all. I don't.

Mon, 13 Apr 2009 at 04:03:27 GMT Link


41.Lanky said:

I've added it already to my own site. I have a few sub-domains that make the URLs quite long. Using the original domain and some mod_rewrite rules I have "shortened urls" for my services.

Mon, 13 Apr 2009 at 07:56:02 GMT Link


42.nofb said:

The whole idea is grotesque.

Generating extra trafic and server load, introducing a security risk and more complexity in a system that is already hard to keep tidy, all that for the sake of using ridiculously long URL's? Please some study, show them the cost...

Would you prefer a long phone number because it's prettier and matches a statement? No, you don't care because phones have a contact database and you usually don't even pay attention to the real number. Or would you prefer calling that person because you like the digits in their number?

"Oh darling, let's move! I've seen a house over there with a lovely street name, and the number is so pretty, it has my favourite digit!". Right, got carried away here ;-)

With URL's it's the same, who cares? Most of the time you probably won't see it because it's too long for the address bar, shortened as it is with the annoying google search field or whatever you try to squeeze up there. That's what bookmarks are for!

Or links. Do you rather trust links that get you to a very long address you can't see entirely, do you feel safer and does it smell like a spam-proof URL? Spammers are parasites, and by definition they adapt, sorry but you're not safe anywhere.

Use tiny URL's in the first place, and keep it simple!

Mon, 13 Apr 2009 at 09:45:43 GMT Link


43.Alan Hogan said:

Wow, no one is addressing Eli’s issue with rev="canonical" on PHP.net's non-canonical pages. Don't make me go all-caps here. Rev=canonical would be great if you always ended up on the canonical page... but what if your site serves www and no-www variants, even? You should only put rev=canonical on one of them.

I get what is being suggested and why, but it would seem rel="alternate shorter" or some such makes more sense.

Mon, 13 Apr 2009 at 09:53:26 GMT Link


44.Robert Kosara said:

@Alan: Why can't a page have both rev=canonical and rel=canonical? If it has both, it's clear that the shorter URL pointed to by rev=canonical points to the canonical page, not this one. And even if you ignore the rel=canonical, using the shorter URL will do the right thing: take you to the canonical page. Are there cases where that is not the correct behavior?

Mon, 13 Apr 2009 at 11:18:55 GMT Link


45.Randy Kramer said:

nofb said:

> Generating extra trafic and server load, introducing a security risk and more complexity in a system that is already hard to keep tidy, all that for the sake of using ridiculously long URL's? Please some study, show them the cost...

In some cases, the software running a site dictates a long URL, for example, on WikiLearn (at twiki.org), URLs typically look like this:

http://twiki.org/cgi-bin/view/Wikil...AboutThesePages

I'm sure we've all seen worse examples.

Mon, 13 Apr 2009 at 13:35:56 GMT Link


46.Craig said:

I just requested that Identi.ca support this feature in laconica in the future. http://laconi.ca/trac/ticket/1420 I think this is a great idea - I can wait to see adoption rise.

Mon, 13 Apr 2009 at 18:49:49 GMT Link


47.Phil said:

Alan, Eli - I feel your pain. What is being proposed here is a method for specifying preferred shortened forms for URLs, but what 'rev="canonical"' actually means is that some other, possibly shorter URL will redirect you to the current one, the current one being (duh) "canonical". Whatever "canonical" means in this age of temporary and permanent redirects, load balancing, iframes, byte ranges, dynamic content and all other manner of nasties, it isn't "shorter". The two words aren't generally considered synonyms, as far as I'm aware. If the current URL is a load-balanced mirror, it isn't canonical, and that's that.

Putting both 'rev="canonical"' and 'rel="canonical"' on the same page only "solves" the issue by muddying the semantic waters further; reverse links are supposed to indicate relationships between the href and the current URL, not one specified in another link tag:

http://www.w3.org/TR/html401/struct...s.html#adef-rev

Semantics matter. As with table-based layouts, ignoring them to accomplish a particular goal may be convenient (and on some level, may even appear to get the job done), but in the long run it's just the wrong thing to do, and is something the web needs to move away from. As for the sheer, unrivalled brilliance of proposing a new HTTP header when there's a perfectly good "Link:" header in HTTP 1.1, complete with "rev" and "rel" attributes, I'm speechless.

http://www.ietf.org/rfc/rfc2068.txt

Not to mention the fact that any link to a domain you don't know and trust is risky, regardless of whether or not you think you know in advance that said link is not going to redirect you to a different domain; or the narcissism inherent in pre-emptive creation of short URLs to one's own content; or the inevitable exhaustion of useful namespace on any given domain.

The real problem here is the stupidity of building new, shiny services (Twitter) on top of old, broken ones (SMS); then, when those new services turn out to have inherited the old one's limitations, building yet more services (TinyURL et al) to work around the fact, instead of ditching the lot and building something better. SMS at anything approaching modern-day prices should have died off a long time ago, which would have prevented this from ever being an issue.

Regardless of whether or not there is something of value in the concept, you are not coming across as qualified to suggest this level of implementation detail.

Mon, 13 Apr 2009 at 23:33:53 GMT Link


48.Mark Jaquith said:

reverse links are supposed to indicate relationships between the href and the current URL, not one specified in another link tag

This is a pretty big nail in the coffin of this implementation. On most blogs out there, you can pass ?a_random=query_string and not get redirected. A rev="canonical" link on that URL is wrong, because the current URL is not canonical.

rel="alternate shorter" is better. But let's be honest here… we wouldn't be talking about this if it weren't for Twitter. Twitter could, with less work than it would take to support this rev="canonical" proposal, just do a HTTP HEAD request on every URL that is posted to Twitter, looking for 301/302 responses. For the SMS blasts, they keep the short URL. For the web interface and the API, they expand it to the full URL. They can either have their own tinyurl-to-canonical-url lookup table, or just store tweet_content and tweet_sms_content.

With this solution, it doesn't matter that URL shortening services have limited longevity—they'd only be used for SMSes. And your cell phone has way less longevity than TinyURL does.

Tue, 14 Apr 2009 at 05:53:12 GMT Link


49.Ivo Jansch said:

If twitter is one of the reasons we need short urls, maybe we should ask ourselves what the reason is these urls need to be short. Bandwidth isn't the problem, but screen real-estate is. You want urls to be short because in a 140 char message, it leaves more for actual content.

What strikes me as odd is that noone has suggested that twitter should implement simple <a href="long url">short word</a> support. This would mean that under the hood, it can use a larger limit, while on the surface, it can have a 140 character limit and still be perfectly readable. Hide the url completely from the user just like we do in normal html content.

Wed, 15 Apr 2009 at 20:49:52 GMT Link


50.Chris Shiflett said:

Ivo, I'm pretty sure I've seen that suggested somewhere.

I think a simpler solution for Twitter would be to simply never, ever shorten URLs. Ever. It's our job to fit whatever we want to say into 140 characters. They can distance themselves from the problem entirely.

Wed, 15 Apr 2009 at 21:13:57 GMT Link


51.Isaac Hildebrandt said:

Ivo, the reason twitter uses tinyurl is to maximize content real estate, that is correct. But the issue with using an anchor tag is that it is incompatible with sms. So a 140 char tweet would quickly go over the 160 char limit for sms messages when you factor in the long url, short word, and extra markup.

Wed, 15 Apr 2009 at 21:14:47 GMT Link


52.Sean Coates said:

Isaac,

If you can't get to the web to check out the actual URL in a tweet, then then the URL is useless anyway, isn't it? If Twitter were to go this route, they could just pass the word (possibly with sms-friendly markup like [this]) in the SMS.

S

Wed, 15 Apr 2009 at 21:38:10 GMT Link


53.Les said:

Maybe I've missed something but why not simply use the short url in the first place?

Most words in longer urls are ignored by the search engines anyway - using Shiflett's url for an example?

This practice ain't going anywhere folks and to me, it's yet another buzzword for developers to blog about.

I'm a developer myself and guess what? I've got better things to do, sorry.

Fri, 17 Apr 2009 at 00:01:00 GMT Link


54.Chris Shiflett said:

Hi Les,

I have already answered your question.

This practice ain't going anywhere folks and to me, it's yet another buzzword for developers to blog about.

Which practice? Indicating a preferred short URL?

I'm a developer myself and guess what? I've got better things to do, sorry.

Better things like commenting? I don't really understand your point or your tone. Retry?

Fri, 17 Apr 2009 at 02:26:56 GMT Link


55.nik said:

You hope that twitter starts that too, but do not forget they have a restriction #140. So how to handle revs, if for example tinyurl has 20 letters and your lokal rev provides 25. Okey if you know that you will keep that in mind. But could cause a lot of broken links as well...

Twitter also could implement a stop of the convertion to the rev-adress, when link-convertion leeds to more than #140...

Fri, 17 Apr 2009 at 02:31:03 GMT Link


56.Chris Shiflett said:

Hi Nik,

To be honest, I feel like Twitter could avoid implementing anything if they would leave URLs alone. You can't submit an update to Twitter if it's longer than 140 characters, so why should they shorten anything?

If you consider that, then hopefully you can see why it will never matter (to Twitter) what size your short URL is. If it's exactly the same size, it will still fit.

It's still a good point, because the people who want to honor your preference might need to save a few more characters than your preference allows, in which case they'll just use their own.

I don't really think this matters too much. Not everyone will honor your preference regardless, and this idea is more about allowing you to express a preference than it is about trying to force others to honor it.

Thanks for commenting.

Fri, 17 Apr 2009 at 02:45:28 GMT Link


57.Roman said:

If you want to have an alternate short URL, why not just use "a href" that is labeled "short URL for this page", which leads to an adders that does 301 back to the original page?

But even that is a rather bad compromise. AS I see it, the real problem is not with URLs, it's with Twitter and its limitations.

Sun, 19 Apr 2009 at 15:48:22 GMT Link


58.Erik Vold said:

Great article Chris, I am showing my support for rev=canonical over rel=short* here: http://erikvold.com/blog/index.cfm/2009/4/21/rev_canonical_good

Tue, 21 Apr 2009 at 05:22:22 GMT Link


59.Ivo Jansch said:

Although I do not agree entirely that rev=canonical is the way to go (rel=alternate shorter is more flexible for the intended purpose of shortening urls), until there is something better/more final, I've added support for it to http://flackr.net. All event pages (e.g. http://flackr.net/s/d2al ) contain the rev=canonical header).

Thu, 30 Apr 2009 at 08:08:23 GMT Link


60.savvas said:

Great blog post, great ideas, wonderful intentions!

Can someone please make a proper documented suggestion and forward it to the proper organization (I think w3c.org ?) in order to see it as a standardized implementation?

Wed, 24 Jun 2009 at 07:46:00 GMT Link


61.Kenneth Udut said:

I've implemented this rev="canonical" idea on http://free.naplesplus.us in the hopes that it catches on further.

It wasn't hard to put together. I followed the style used on php.net like so:

http://free.naplesplus.us/articles/view.php/50574/colcourt is the original

and in the header, I have:

<link rev="canonical" rel="self alternate shorter shorturl shortlink" href="http://free.naplesplus.us/go.php/colcourt" />

which is the form used on php.net, which I happen to like, as it takes into account not only the rev=canonical, but ALSO the rel= self, or alternate or shorter or shorturl or shortlink ideas, which are ALSO excellent notions.

If I wanted to go further, I will probably do a mod-rewrite to make the go.php into go

But I figured since I had this GREAT nick-name system already in place that I was hardly taking any advantage of on my CMS (YACS - a little known system out of france), I might as well use it.

It's not MUCH shorter (maybe 5-10 characters shorter at best - and some of them end up being LONGER) - but I figure that it's a beginning of a great idea!

Ken Udut of Naples Florida

Fri, 26 Jun 2009 at 21:28:35 GMT Link


62.Floren Munteanu said:

Interesting article, Chris. I agree with Eli's comment:

"So why bother having the rev="canonical" in the first place? It's less generically useful, you can always just use the rel="alternate shorter" version anyway ... plus it's not HTML 5 compliant anyway."

Sun, 09 Aug 2009 at 21:58:43 GMT Link


63.Alparslan Pehlivan said:

I want to ask a question about rel.

What is mean of rel="friend" and what is worth of this? Maybe it is same as canonical? Have you any idea?

I follow this topic.

Thanks,

http://www.kaiserdealxa.com/

Alparslan Pehlivan.

Best Regards.

Mon, 05 Jul 2010 at 18:59:02 GMT Link


64.Tony said:

Hi Alparslan,

In my opinion, rel="friend" is more important then rel="canonical", because search engines like friends and their links.

Tue, 06 Jul 2010 at 06:53:34 GMT Link


65.Sylvio Thomson said:

Hello Tony,

I do not agree you. I think that search engines, which is especially Google, like canonical, because of its new and quality.

Thanks,

Sylvio Thomson.

Wed, 07 Jul 2010 at 10:22:32 GMT Link


Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.