Deciding Whether to Drop Anchor

One goal in the move to new software on Workbench is to salvage incoming links from other sites. When you break weblog entry permalinks, you break links on every site that referred to your entries. Because I use weblog archives as a research tool often in my programming, I don't want to hose permalinks switching from Radio UserLand to my hand-coded LAMP software.

I thought I could write a short PHP script to redirect each old Radio-style link to its new link -- just grab the anchor portion of the URL that follows the pound sign ("#"), make a MySQL database query to match the old anchor to the new one, then redirect the request to the proper resource.

No such luck.

The portion of a URL that follows "#" is called the fragment identifier. Radio finds an entry's internal ID number with this fragment -- for instance, #a1833 refers to entry 1833 in the weblogData.posts table.

Much to my surprise, fragment identifiers are not passed by a Web browser to a server as part of a URL request. Instead, they're used strictly by the Web browser, as described by the W3C:

Interpretation of the fragment identifier is performed solely by the agent that dereferences a URI; the fragment identifier is not passed to other systems during the process of retrieval. This means that some intermediaries in Web architecture (such as proxies) have no interaction with fragment identifiers and that redirection (in HTTP [RFC2616], for example) does not account for fragments.

When you request http://ekzemplo.com/view.html#note, the server only receives http://ekzemplo.com/view.html. The browser loads the page and jumps to the anchor named note.

Because the browser doesn't share the fragment identifier, there's no way for a Web application, Apache's mod_rewrite module, or anything else on the server to take action based on that part of the URL.

More details can be found in a proposed Internet draft about fragments and URL redirection.

To retain old permalinks, my new software puts two anchors in an old entry -- its Radio ID and new ID, as in this example:

Perhaps this is old news, but I'm amazed at the design implications of this fragment issue for entry-based publishing tools like Radio. Web servers never see the full permalink generated by the software (or my own, which also uses "#" links). I'm thinking about recoding to drop fragment identifiers in permalinks, but I can't figure out how without adopting a one-entry-per-page approach to archives, which I'd like to avoid.

Comments

Did you get the message I sent last week using this website? If not, send me an email at bill at my domain and I'll send it again.

Can't you do without fragment identifiers at all?

Like here:
kaste.lv

Or here:
kaste.lv
(back when my blog engine did not support putting title fragment in the url)

mod_rewrite module is cool

This may be the W3C standard and as such the anchor may not be passed on by all browsers.

However, Apache rewrite does have support for allowing anchors using the [NE] option to avoid them being escaped.

As mentioned by the Apache documentation, see:

httpd.apache.org (extended redirection)

I have tested this to work well with IE and Firefox.

All it takes is some server side scripting to escape this (and maybe some nasties) before your application tries to use them as the URL.

- Donovan

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).