Sam Gentile lost his Radio Userland data in a hard drive crash, leaving him with the HTML files associated with his weblog (and his categories) but none of the data in weblogData.root, the database where a weblog is stored in Radio Userland.

I'm going to scrape the HTML files for his entries using Java and rebuild weblogData.root, probably by creating an XML version of his entire site as an intermediate step. Does anyone know of any Radio Userland verbs or third-party tools that will post weblog entries to the past?

Comments

Hi Rogers,

I already wrote a script that crawls Sam's archives to re-create the weblogData.posts table using his archives.

Lawrence

Great. He also has three categories that need to be crawled, net, scienceFiction, and xmlAndWebServices:

http://radio.weblogs.com/0105852/categories/

Have you published the script anywhere? I'd like to put my old non-Radio weblog into Workbench, so I'll be needing a similar technique to the script I was going to write for Sam.

Thanks for the help both of you. The first script wiped out my postings of the last 2 days with some error. I do have another one from you but of course, at the worst time, my cable is out, and I can't do anything. I am at work now and will try tonight. Thanks to both of you for your help.

This is a really, really bad situation. The file I was given wiped out my last 2 days posting. Now my site is totally blank. I hope UserLand realizes the amount of traffic and influence that I have in the .NET world and how much traffic (and new users) I have been driving to this site. As one blogger put it: http://radio.weblogs.com/0108189/2002/07/04.html#a65
A lot of people will owe you a debt of gratitute. Sam's blog is an extremely valuable resource to myself and I'm sure many others. Getting him back up and running is key.

It will also be very comforting to know that such a tool exists. I am still going to implement a regular backup strategy, but I will still be nice to know there is an extra safety net. It never hurts to have more redundancy. Thanks Roger.

* I guess I don't understand why scraping and all this is neccessary. All the HTML files were "scraped" and collected by Roger. I have them all now local. Why on earth doesn't Radio take them all out of the directory and upload them? What more is needed? I mean this doesn't seem like Radio is built on rocket science. Transfer files out of a local directory onto a server. What more is there?

This is very fustrating and may be more trouble than its worth. I will give up one more night of time tonight when I get home. Thats it. Then I'm off to Movable Type with my other bloggers.

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).