Adding Atom 1.0 Support to RSS Sites

I switched to Atom 1.0 on Workbench two months ago, a move that hasn't been as smooth as I'd like because of one popular aggregator that doesn't support the format.

This site is created using Wordzilla, a LAMP-based weblog publishing tool that I've developed over the last year. Writing code to generate Atom feeds in PHP was extremely simple, since most of the code used to generate RSS feeds could be applied to the task.

Atom uses a different format for date-time values than RSS, so I had to write new date-handling code:

// get the most recent entry's publication date (a MySQL datetime value)
$pubdate = $entry[0]['pubdate']);
// convert it to an Atom RFC 3339 date
$updated = date('Y-m-dTH:i:sZ', strtotime($pubdate));
// add it to the feed
$output .= "<updated>{$updated}</updated>n";

This produces a properly formatted Atom date element:

<updated>2006-05-27T11:03:17Z</updated>

One thing I haven't been able to do with Really Simple Syndication is indicate an item's author, because RSS requires that an e-mail address be used for this purpose. Spammers snarf up e-mail addresses in syndicated feeds.

Atom supports author elements that can be a username instead:

<author>
  <name>rcade</name>
</author>

The most significant difference between RSS and Atom is the requirement that Atom text elements specify the type of content that they hold, which can be HTML, XHTML or text.

The content type must be identified with a type attribute:

<content type="html"><![CDATA[I own some Home Depot stock ...]]></content>

My Atom feed offers the text of weblog entries as HTML markup:

// get the entry's description (a MySQL text value)
$description = $e['description'];
// add it to the feed
$output .= "<content type="html"><![CDATA[{$description}]]></content>n";

Putting this text inside a CDATA block removes the need to convert the characters "<", ">", and "&" to XML entities.

When an Atom element omits the type attribute, it's assumed to be text.

The following PHP code creates XML-safe text for entry titles:

// get the entry's title
$title = $e['title'];
// convert the title to XML-safe text
$title = utf8_encode(htmlspecialchars($title));
// add it to the feed
$output .= "<title>$title</title>n";

The last difference I had to deal with is Atom's requirement that each entry have a title. Because I haven't written titles for all entries on Workbench, I wrote a function that can create a title from the first 25 words of an entry's description:

function get_text_excerpt($text, $max_length = 25) {
  $text = strip_tags($text);
  if (strlen($text) <= $max_length) {
    return $text;
  }
  $subtext = substr($text, 0, $max_length);
  $last_space = strrpos($subtext, " ");
  if ($last_space === false) {
    return $text;
  }
  return substr($subtext, 0, $last_space);
}

I switched to Atom whole hog, dropping the RSS feed and redirecting requests to the new Atom feed.

I quickly reinstated the RSS feed because I'm getting 4,000 requests a week from subscribers running Radio UserLand, which doesn't support Atom 1.0. Trying to subscribe in the current version, Radio 8.2.1, results in the error message "Can't evaluate the expression because the name 'version' hasn't been defined."

That's the only popular aggregator I've tested that doesn't support Atom 1.0, though I've read that the OPML Editor's River of News also can't handle these feeds.

I'm not going to support both formats on new programming projects just for Radio, because its users ought to nudge UserLand to upgrade Atom support to version 1.0. I'd like to redirect RSS requests to the Atom feed so that all subscribers are seeing the same thing and sites like Bloglines offer one subscription count. But dropping existing RSS support makes little sense.

Atom's content type requirement is a great improvement to syndication, allowing publishers to specify exactly what they're using a feed to carry. The RSS engine built in to Microsoft's next version of Windows produces RSS 2.0 feeds that have an extra type attribute in each description, even though it's not defined in the spec.

Home Depot's Board Dodges Shareholders

All but one of Home Depot's 11-member board of directors was a no-show at the company's annual meeting Thursday, where several proposals questioned the huge executive compensation paid to CEO Robert Nardelli.

Nardelli was the only board member present at the meeting, which ended quickly because he didn't give the customary speech and took no questions from the audience.

In a statement prepared in response to this article, the retailer said that, although its approach to the annual meeting this year was a departure from past practice, it should not be seen as a lack of respect for shareholders or a lessening of its commitment to sound corporate governance and transparency.

I know it's quaint to believe that publicly traded companies serve at the discretion of their shareholders, but this demonstrates outright contempt for the idea:

Each time Mr. Nardelli was addressed by someone presenting a shareholder proposal, he let them speak but then replied firmly that Home Depot's directors had urged voting against it, and moved on to the next agenda item. After about 30 minutes, Mr. Nardelli said "preliminary" results indicated that the company's slate of director nominees had been elected, adding that the only proposal to win a majority of shareholder votes was a recommendation led by the United Brotherhood of Carpenters pension fund to change the director-election process to require that nominees receive a majority of votes.

Mr. Nardelli then adjourned the meeting and left the building.

Home Depot's leadership must not believe there's even the slightest possibility any of them will be accountable for their decisions, and they seem to be right: The entire board was re-elected for another term.

A note on the home page of Planet Apache:

Planet Apache provides its aggregated feeds in RSS 2.0, RSS 1.0 and RSS 0.9, and its blogroll in FOAF and OPML (the most horrific abuse of XML known to man).

RSS 2.0 Specification

'Over the Hedge' Rocks the Suburbs

Over the Hedge

Took the kids yesterday to see Over the Hedge, an animated comic strip adaptation by DreamWorks about forest creatures who find their home overtaken by a humongous residential community.

Computer-animated films are my favorite family movies these days, because even when the story's dull the rendering effects are worth seeing on a giant screen.

I didn't notice a single new visual in Over the Hedge comparable to the fur in Monsters Inc. or the expressive human faces in The Incredibles, but the movie had something else going for it -- music by Ben Folds.

Folds has at least three songs in the film, each with a few soulful pokes at suburbia, and the end credits include a new version of "Rockin' the Suburbs." Some critics slammed his work -- FilmCritic.Com derides him as an "elevator-music rocker" -- but I liked the songs, though Folds loses something when he can't mix plaintive piano melodies with bile and profanity. (Who else could make the line "give me my money back, you bitch" irresistible to hum?)

The funniest jokes involve the voracious consumerism of the humans, led by a homeowner's association president voiced by Alison Janney. The animals lose their appetite for foraging after being introduced to junk food, including a stackable Pringle's-like chip called Spuddie's that bears the slogan "Because enough just isn't enough."

When a streetwise raccoon named R.J. explains the world of humans to the other animals, they discover an S.U.V. and marvel at its immense size. "How many humans fit in one?" he's asked.

His answer: "Usually, just one."

I would've liked more incisive digs like that and at least one really grim moment where Folds could crank the pathos up to 11, like the part in Toy Story 2 where Sarah McLachlan sings "When She Loved Me" as Jessie's being discarded by her owner.

But this film played mostly to the intended audience, and my representatives gave it the highest compliment they bestow upon a film in the theater: They danced to the end credits.

RSS: Can't We All Just Get Along?

We made a little history this week in the RSS community. For the first time ever, the publishers of the two competing versions of RSS have agreed on something -- the need for a common RSS MIME type.

Six years ago, a split occurred when two groups laid claim to the name RSS.

Netscape engineer Dan Libby authored RSS 0.90, the first version of the format, in mid-1999. The initials stood for "RDF Site Summary" and it made use of the Resource Definition Framework, a Worldwide Web Consortium (W3C) standard for describing web content so that it's more easily understood by software.

At the urging of Dave Winer of UserLand Software and other early RSS adopters, Libby removed RDF support from RSS 0.91, the second version of the format, upon its June 1999 release, as he explained to the RSS-DEV mailing list:

... the primary users of RSS (Dave Winer the most vocal among them) were asking why it needed to be so complex and why it didn't have support for various features, eg update frequencies. We really had no good answer, given that we weren't using RDF for any useful purpose. ... The compromise was to produce RSS 0.91, which could be validated with any validating XML parser, and which incorporated much of userland's vocabulary, thus removing most (I think) of Dave's major objections. I felt slightly bad about this, but given actual usage at the time, I felt it better suited the needs of its users: simplicity, correctness, and a larger vocabulary, without RDF baggage.

Because the format was no longer built on RDF, the name was changed from "RDF Site Summary" to "Rich Site Summary."

Shortly after the release of RSS 0.91, Netscape stopped publishing its RSS documentation and dropped support for the format on its My Netscape portal, a move that in hindsight ranks among the biggest blunders in web history. The company that made billions by seeing an opportunity in web browsers missed another goldmine in RSS, giving it up just as the format began to spark the boom in blogging and syndication. (They're just now getting back into RSS, with Netscape parent company AOL releasing a new My AOL service that reads syndicated feeds.)

In June 2000, Winer published his own version of RSS 0.91 by fiat, taking input from developers who had continued to use the format in spite of Netscape's abandonment. He explained that it was done without the company's permission or participation, dubbing the new version "Really Simple Syndication."

Neither Winer nor his company, UserLand Software, claimed ownership rights in the RSS format or name. UserLand attempted to register RSS as a trademark in September 2000 but abandoned the effort shortly thereafter.

In December 2000, the RSS-Dev Working Group released a new version of RSS called RSS 1.0, adding RDF back to the format and reviving the name RDF Site Summary. This also was done without Netscape's involvement.

Really Simple Syndication was subsequently released as RSS 2.0 and the two rival RSSes have been battling it out ever since.

Home Depot CEO Builds Huge Nest Egg

I own some Home Depot stock, so I'll be casting 30 of the 2.1 billion votes at the 2006 annual meeting Thursday. The proposals are usually dull, but there's a nice snarky one this year about excessive executive compensation that blasts company CEO Robert Nardelli:

In our view, senior executive compensation at Home Depot has been excessive in recent years. In each of the last three years, CEO Robert Nardelli has been paid a base salary of more than $1,800,000, well in excess of the IRS cap for deductibility of non-performance-based compensation. His bonus in each of those years has been at least $4,000,000, and he was awarded restricted stock valued at over $8,000,000 in 2002, 2003 and 2004. Mr. Nardelli has also received a disturbingly large amount of compensation in form of "loan forgiveness" and tax gross-ups related to that forgiveness, which totaled over $3,000,000 in each of the past three years.

We believe that the current rules governing senior executive compensation do not give stockholders enough influence over pay practices. In the United Kingdom, public companies allow stockholders to cast an advisory vote on the "directors remuneration report." Such a vote isn't binding, but allows stockholders a clear voice which could help reduce excessive pay. U.S. stock exchange listing standards do require shareholder approval of equity-based compensation plans; those plans, however, set general parameters and accord the compensation committee substantial discretion in making awards and establishing performance thresholds for a particular year. Stockholders do not have any mechanism for providing ongoing input on the application of those general standards to individual pay packages. (See Lucian Bebchuk & Jesse Fried, Pay Without Performance 49 (2004))

During the six years Nardelli has led Home Depot, he's earned $154.3 million plus millions more in stock options. The company's stock price dropped 6 percent last year and is lower than when he arrived in 2000, while in the same period, Lowe's delivered 200 percent return for its shareholders. "The board at Home Depot has rewarded Nardelli for mediocre to poor performance," Paul Lapides, director of the corporate governance center at Kennesaw State University, told the Atlanta Journal-Constitution. "The pay for Lowe's former chairman is a quarter of Nardelli's annual pay, and Lowe's has outperformed Home Depot in the last six years."

Home Depot stacks the deck against shareholder proposals by obscuring the identity of the proponent, and the board of directors recommends a vote for or against each one. (They're against more scrutiny of executive compensation.)

One of the company's largest shareholders, the California Public Employees' Retirement System, came out in favor of this proposal last week.

A second proposal's even more blunt about Nardelli, calling for the company to stop letting one person serve as CEO and chairman of the board:

The pay-for-failure, pay-for-success, pay-for-anything-at-all attitude displayed by our board calls into serious question its effectiveness. ...

It is well to remember that at Enron, WorldCom, Tyco, and other legends of mis-management and/or corruption, the Chairman also served as CEO.

I Enjoy Particularly Rigorous Specs

James E. Robinson III has a confession to make:

I read specs. While sometimes messing with specs turns into a waste of time. Many times understanding the spec can keep you out of trouble. The problem is that specs are tedious, but the reality is that they have to be. Nothing is worse than a poorly written spec.

Being patient and weeding thru specifications helps you understand not just how something is designed to work, but why. I used to read specs because i had to; now i read them because i want to ... even the boring ones.

I have evolved into a spec-reading, spec-writing, specs-crazed dork during my time on the RSS Advisory Board. I now take pride in my personal compliance with RFC 2119, both in the documentation that I write and in everyday conversation. You SHOULD read it. I RECOMMEND it highly.

I used to think there was a virtue in less precise, more readable specs because they are much less intimidating to new implementers of a format. The success of XML-RPC has been driven in part by how easy the spec is to understand at first read.

But making software interoperate well is a hard job that becomes significantly harder when a spec lacks precision. An incredible amount of time can be burned on arguments over interpretation, especially when a programmer is told that his code doesn't meet a spec.

Matt Mullenweg of WordPress, a programmer so militant about web standards that he once called for a boycott of LockerGnome because it used HTML tables for layout instead of Cascading Style Sheets, recently flipped out when WordPress RSS feeds were declared invalid by the Feed Validator.

The validator had been relying on incorrect capitalization of a namespace element called wfw:commentRSS. When informed that it was wfw:commentRss instead, Sam Ruby updated the validator to follow the wfw spec.

Mullenweg, hearing from users expecting him to change his code so that their feeds passed the validator, declared that "the Feed Validator is dead to me:"

Here is a post on their mailing list which also explains the issue and includes a link to the archive.org version of the page with the capitialization everyone uses, which was there for at least two years. One line can cause so much trouble.

One line can cause an incredible amount of trouble, which is why every line in a spec has to be precise, thoroughly vetted, and developed within a framework for resolving disagreements and moving on.

I know this will sound ridiculously pedantic to programmers who are sane enough to stay away from specification development, but you have to write these documents in such a rigorous manner that every use of "should," "may", and "must" means exactly the same thing. You have to read them like a Supreme Court jurist poring over the U.S. Constitution.

An easy-to-read spec is like an adjustable-rate mortgage. You get in cheap, but you have absolutely no way of knowing how costly it will be in the long run.