The RSS Advisory Board site now includes all of the articles, weblog entries, and comments from the group's old Manila site, dating back to the group's founding in 2004.
I never got a copy of the old site's root file from Harvard, so I collected the content using an obscure but cool feature of Manila: All site content is saved in the discussion board as individual messages, each of which can be downloaded as an OPML file. For example, open this weblog entry from Craig Burton's Manila blog in OPML format.
I wrote a Java application that used Apache HttpClient to download the files and XOM to process the OPML.
OPML sucks, but I got thousands of weblog files into a MySQL database so I can't complain. Manila stores message text in the text attribute of outline elements, some of which may be nested. Weblog entries are formatted using the most insane thing I've ever seen in an XML dialect:
<outline text="<title>Hackers selling IDs for $14, Symantec says</title>"/>
You need to be an XML dork to appreciate this, but it's XML elements stored as escaped markup inside XML attributes.
I subscribe to your site having purchased your Userland Kickstart book... now no longer needed since I've moved to using Wordpress and Rapidweaver for my sites.
I'd be interested to know whether any of these tools would be usable (by someone who didn't really get to grips with everything in that book) for exporting my OPML Editor site into Wordpress.
I've tried but unlike Radio to Wordpress which was easy (except no linked headers) I've found no way to do this other than one post at a time.
Does the OPML Editor publish a copy of the site in OPML format? I thought it did, but I can't find one for any of the OPML Editor blogs.
"<outline text="<title>Hackers selling IDs for $14, Symantec says</title>"/>"
Wow. Thanks for sharing that. There are things in this world I would not believe unless I'd seen them with my own eyes.
By the way, the comment preview feature didn't really work for my last comment. I had to hand edit the brackets back to & lt;