The RSS Advisory Board site now includes all of the articles, weblog entries, and comments from the group's old Manila site, dating back to the group's founding in 2004.
I never got a copy of the old site's root file from Harvard, so I collected the content using an obscure but cool feature of Manila: All site content is saved in the discussion board as individual messages, each of which can be downloaded as an OPML file. For example, open this weblog entry from Craig Burton's Manila blog in OPML format.
I wrote a Java application that used Apache HttpClient to download the files and XOM to process the OPML.
OPML sucks, but I got thousands of weblog files into a MySQL database so I can't complain. Manila stores message text in the text attribute of outline elements, some of which may be nested. Weblog entries are formatted using the most insane thing I've ever seen in an XML dialect:
<outline text="<newsItem>"/>
<outline text="<title>Hackers selling IDs for $14, Symantec says</title>"/>
<outline text="<url></url>"/>
<outline text="</newsItem>"/>
You need to be an XML dork to appreciate this, but it's XML elements stored as escaped markup inside XML attributes.
