With the current interest in rssCloud and PubSubHubbub (PuSH), I've been thinking about all the bandwidth that's consumed by the RSS elements that describe the feed. When a client requests an RSS feed 10 times in one day, it gets the basic details of the feed over and over again. When clients request the Workbench feed, they get 1,800 characters containing optional RSS elements that I haven't changed in years, except for the PuSH element I added last month. Workbench has 1,900 feed subscribers, so if they average 10 checks a day, they're consuming 32 megabytes every day on information they know already.
James Holderness directed me to RFC3229+feed, a method to request partial RSS feeds that omit elements that a client has already seen. That's useful and has been adopted by some feed publishers and clients, but as far as I can determine, the approach still sends all of the channel elements that describe the feed itself. I wanted to float an idea here to see if it would be useful:
<rssboard:feedDetails>
http://ekzemplo.com/feedinfo.rss
</rssboard:feedDetails>
This channel-level RSS element identifies a URL that contains the full details about the feed. The details would be expressed as an RSS feed without any item elements.
An optional ttl attribute could contain the number of days the publisher would like clients to cache the information before checking it again:
<rssboard:feedDetails ttl="30">
http://ekzemplo.com/feedinfo.rss
</rssboard:feedDetails>
A feed publisher who wished to make use of this could move all channel elements except for title, link, description and atom:link to the detail URL. Title, link and description are required in RSS, and atom:link identifies the feed's URL so it can't be moved.
Instead of ttl, why not insert the date/time when the channel information was last updated? That way, there's never a delay in notifying clients of changes, and there's never a need to check for changes when there aren't any.
That's a great idea. A lastUpdated attribute with an RSS date-time value would be better than ttl.
A similar solution to feedDetails is X-Include. It's already well documented and understood. A downfall of these approaches is that they actually increases connections, which is often more important than bandwidth when you are scaling RSS.
On the publisher's end, it only costs one more connection for each time the feed details are downloaded, which could be once a week or once a month. That shouldn't be a scaling issue.
What about good old ETag and Last-Modified (or if you're really feeling sporty, Cache-Control)? If you're worried about bandwidth, Content-Length: 0 is the best solution of all.
+1 with Gordon Weakliem's comment about conditional GET. Also, isn't the response compressed using gzip or deflate for the majority of clients?
I guess the bandwidth argument would hold for Facebook et al, with their obvious cost savings available in being able to save 200 bytes per request. I appreciate the idea of splitting out content that has different rates of change, but I'm not convinced that there's a real problem in the current form.
All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).