Just how many types of anxiety are there, anyway? I got to thinking about this as I read a blog that mentioned "RSS Anxiety." For those of you who have not yet come face-to-face with this little acronym, it stands for Real Simple Syndication and it spreads whatever you want all over the internet, virtually creating an immortal life all its own.
Can you kill an idea once it is out on the internet? No. Can you try to correct it? Yes, but you'll never accomplish this goal. -- Patricia Farrell, author of How to Be Your Own Therapist
Earlier this week, the Press-Enterprise newspaper of Riverside, Calif., ran a business story about an industrial zone six miles from the border of North and South Korea that seeks to strengthen ties between the bitter, long-warring rivals.
The newspaper gave the article the headline, "Business park makes ties that build Korean détente."
The word détente is one of many in English that includes a diacritical mark, a symbol such as an accent, cedilla or diaeresis.
The paper included the story in a Really Simple Syndication feed. If the editors take a look at how the headline appears in some of the leading RSS software, they'll discover one of the unfortunate realities of working with the format:
RSS does not allow détente.
The Press-Enterprise gave the headline the following formatting:
<title>Business park makes ties that build Korean d&eacute;tente</title>
The "d&eacute;tente" part is an attempt to get an RSS reader to produce the output "détente". The "é" in détente is an acute accent diacritical mark, and one way to write one on the web is by using the HTML entity é.
The following screen captures show how this headline appears in eight highly popular RSS readers and web browsers:
Microsoft Internet Explorer 7:
As you can see, five of the eight display détente and the other three display "détente," including the two most popular web browsers on the planet. The difference occurs because the first group expects an RSS item's title to contain HTML, while the second group expects it to be plain text.
The simplest conclusion is that one group's not implementing the RSS title element properly, but there's nothing simple about the issue in the current specification. The spec states that an item's description can be HTML, but it doesn't state whether any other elements can do likewise.
One section of the draft specification attempts to solve the problem in this manner:
For all elements defined in this specification that enclose character data, the text must be interpreted as plain text with the exception of an item's description element, which must be suitable for presentation as HTML.
Dave Winer declared today that the war to clarify the spec is over and everybody won:
We live with the imperfections of RSS 2.0, because that's the way life is. Nothing and no one is exactly as we'd like them to be.
If that's supposed to be the final word on RSS, can somebody tell me how to build détente?
The title of this article as seen in my aggregator, which is a local install of Drupal.
Just for the sake of testing, try using the literal character code é instead of the entity code. That is, <title>Business park makes ties that build Korean détente</title>
Could you publish a screencap of the headline from the newspaper feed, Richard?
My software here on Workbench has a separate formatting problem with HTML entity codes in headlines that's above and beyond the problem with them in RSS.
Rogers: give this test feed a shot and see how your various readers render it: www.snellspace.com
Personally, I like characters.
In this particular case, UTF-8 characters. Sure, there are still broken things that will take your UTF-8 and display it as plus whatever, but that takes it out of the hellish RSS "I can't tell you, and you can't guess" situation and into the clean and clear XML, where if you correctly use UTF-8, and correctly say that you are using it, nobody has to hire a consultant to decide who is getting it wrong if something fails to display it.
Of course, you're still toast if you need to use a less-than (or, in some circumstances, an ampersand), which is why I no longer produce RSS, and will not until there's a spec which says whether or not titles are HTML, in a way that makes me think the spec author actually understood the situation, but as you say, that's apparently not an option.
An RSS title may easily include all of the 90,000+ Unicode characters, save two. Shame I actually need those two: supporting everything except Qaaf and Nuun would have been more useful to me.
Heh. And note that there were two UTF-8 characters, an e with an acute accent and the capital A with a hat that we're all too familiar with, in my comment, and there's no question whose fault it is that they aren't there now. After years in the bowels of Mozilla's bug database, I don't mind that sort of clear and known bug at all - they're like old rowdy friends. It's just the unpredictable ones, that act normal until they go crazy, kill your cat, and then sue you for mental distress, that I can't stand.
I ran the feed against Feedcache and output it via our RSS desktop tool - results here: www.byte.org
Rogers, this is a very clear and concise explanation of one of the biggest problems with the current spec; I thank you for providing something I can point nearly anyone to and expect them to understand.
For completeness, I use Gregarius as my syndication aggregator, and it shows your accented e; I believe it uses the Magpie feed parser at its core, but I haven't a clue if it does postprocessing on any of the elements it receives from Magpie's parsing run.
This one goes way back. Sam Ruby titled a post "Détente" during the 2004 remix of the annual syndication Bimbo Eruption and we all had a good cry about things then.
Keith and angle brackets, y'all.
On personal opinion, I find this very helpful.
Guys, I have also posted some more relevant info further on this, not sure if you find it useful: www.bidmaxhost.com