Strict parsing boosts formats like Atom

Brent Simmons is willing to make the NetNewsWire news aggregator a hard-ass about parsing Atom files. His software is more lenient with RSS, which might suggest that he's favoring that format, but in reality this will help Atom. In a post on his weblog, Tim Bray explains why:

An Atom feed is going to be defined as an XML document, which means that if it's not well-formed then it's not Atom. All it needs is for one (I repeat, one) popular newsreader with a large installed base to enforce this policy (stop parsing and display an error to the subscriber) to turn this from de jure to de facto reality. This works because Atom doesn't have an installed base.

After wrestling for days trying to write Java code to read Chef Moz RDF database files, XML files in UTF-8 format that turn out to be neither UTF-8 nor XML, I'm in strong support of strict parsing for new formats. Without it, there's no penalty for being non-compliant, which leads to lots of bad data supported by workarounds that have to be dealt with forever.

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).