Adding Atom 1.0 Support to RSS Sites

I switched to Atom 1.0 on Workbench two months ago, a move that hasn't been as smooth as I'd like because of one popular aggregator that doesn't support the format.

This site is created using Wordzilla, a LAMP-based weblog publishing tool that I've developed over the last year. Writing code to generate Atom feeds in PHP was extremely simple, since most of the code used to generate RSS feeds could be applied to the task.

Atom uses a different format for date-time values than RSS, so I had to write new date-handling code:

// get the most recent entry's publication date (a MySQL datetime value)
$pubdate = $entry[0]['pubdate']);
// convert it to an Atom RFC 3339 date
$updated = date('Y-m-dTH:i:sZ', strtotime($pubdate));
// add it to the feed
$output .= "<updated>{$updated}</updated>n";

This produces a properly formatted Atom date element:

<updated>2006-05-27T11:03:17Z</updated>

One thing I haven't been able to do with Really Simple Syndication is indicate an item's author, because RSS requires that an e-mail address be used for this purpose. Spammers snarf up e-mail addresses in syndicated feeds.

Atom supports author elements that can be a username instead:

<author>
  <name>rcade</name>
</author>

The most significant difference between RSS and Atom is the requirement that Atom text elements specify the type of content that they hold, which can be HTML, XHTML or text.

The content type must be identified with a type attribute:

<content type="html"><![CDATA[I own some Home Depot stock ...]]></content>

My Atom feed offers the text of weblog entries as HTML markup:

// get the entry's description (a MySQL text value)
$description = $e['description'];
// add it to the feed
$output .= "<content type="html"><![CDATA[{$description}]]></content>n";

Putting this text inside a CDATA block removes the need to convert the characters "<", ">", and "&" to XML entities.

When an Atom element omits the type attribute, it's assumed to be text.

The following PHP code creates XML-safe text for entry titles:

// get the entry's title
$title = $e['title'];
// convert the title to XML-safe text
$title = utf8_encode(htmlspecialchars($title));
// add it to the feed
$output .= "<title>$title</title>n";

The last difference I had to deal with is Atom's requirement that each entry have a title. Because I haven't written titles for all entries on Workbench, I wrote a function that can create a title from the first 25 words of an entry's description:

function get_text_excerpt($text, $max_length = 25) {
  $text = strip_tags($text);
  if (strlen($text) <= $max_length) {
    return $text;
  }
  $subtext = substr($text, 0, $max_length);
  $last_space = strrpos($subtext, " ");
  if ($last_space === false) {
    return $text;
  }
  return substr($subtext, 0, $last_space);
}

I switched to Atom whole hog, dropping the RSS feed and redirecting requests to the new Atom feed.

I quickly reinstated the RSS feed because I'm getting 4,000 requests a week from subscribers running Radio UserLand, which doesn't support Atom 1.0. Trying to subscribe in the current version, Radio 8.2.1, results in the error message "Can't evaluate the expression because the name 'version' hasn't been defined."

That's the only popular aggregator I've tested that doesn't support Atom 1.0, though I've read that the OPML Editor's River of News also can't handle these feeds.

I'm not going to support both formats on new programming projects just for Radio, because its users ought to nudge UserLand to upgrade Atom support to version 1.0. I'd like to redirect RSS requests to the Atom feed so that all subscribers are seeing the same thing and sites like Bloglines offer one subscription count. But dropping existing RSS support makes little sense.

Atom's content type requirement is a great improvement to syndication, allowing publishers to specify exactly what they're using a feed to carry. The RSS engine built in to Microsoft's next version of Windows produces RSS 2.0 feeds that have an extra type attribute in each description, even though it's not defined in the spec.

Comments

Looks like your HTML got munged in many different places in the posting. Good luck unravelling it!

Heh. Looks like you caught it about the same time I did....

I edited the post about 40 times after publication to get the HTML and XML formatting correct. Ugh.

date(DATE_ATOM); produces RFC3339 valid date (as of 5.1.3)..

"The following PHP code creates XML-safe text for entry titles:
// get the entry's title
$title = $e['title'];
// convert the title to XML-safe text
$title = utf8_encode(htmlspecialchars($title));
// add it to the feed
$output .= "{$e['title']}n";"

I assume you meant to write $output .= "$title"; here? :)

If you are going to go through the bother of producing an RSS feed on behalf of the Radio UserLand users, just remember that Radio UserLand doesn't support utf-8, and expects titles to be encoded HTML (what happened to the parent of that comment thread?)

I recently did some feed rationalizing and decided that I didn't want users to see what the feed format was from its URI (thereby reserving the right to change it without anyone having to change their URIs).

Currently I provide only one feed per blog and every URI that asks for anything that looks like a feed is getting redirected there. Currently it's RSS2 (with dc:creator tags for author name), but I'd be happy to change to Atom if there is ever a need.

Rogers:

I'm working on Atom 1.0 support for Radio. Basic support can be found here:

houseofwarwick.com

Formal and correct support will come in an upcoming Radio.root release.

Steve

> I've read that the OPML Editor's River of News also can't handle these feeds.

True. Workbench stopped flowing for me, and it looks like it remembered the redirect, because it didn't switch back until I manually intervened just now.

Thanks for the info, glad to be getting your flow again.

I don't see it in your most recent edit, but when creating the title, be careful not to prematurely truncate HTML entities such as   - I learned this the hard way after writing a remarkeably similiar function.

Sigh, I escaped it to show up (which it did in preview mode, but not on final publish). That's an HTML non-breaking space in the previous comment which you can't see. Amperand nbsp semi-colon.

Oh, now I see what the problem is: Can't post backslashes on this site.

I meant to say: In the format string for the date function, be sure to escape T and Z, so they are not replaced with the time expressions they denote.

Atom is the requirement that Atom text elements specify the type of content that they hold, which can be HTML, XHTML or text. dissertation proposal | statistics assignment | essay writing

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).