Converting a WordPress Blog to HTML Files

WordPress logo tilted to the left

I've been doing more programming lately, primarily in Java because I am writing several books that teach the language. I have a few big announcements coming soon about those projects.

My current coding effort is an application that turns a no-longer-updated WordPress blog into a set of static HTML pages. The goal is to make it easier to retire a blog while keeping the content available in the form that's most likely to be future proof and extremely simple to move around.

WordPress can export a blog's pages, entries and comments to a single XML file. The export file is an RSS feed extended with several namespaces, which the company has dubbed WordPress eXtended RSS (WXR). To create a WXR file of your blog, go to your WordPress dashboard and choose Tools, Export. A page opens with an Export command that creates the file and initiates the download to your computer.

Although the WXR format isn't documented, any programmer who has worked with RSS feeds can figure out the purpose of most elements just by looking at an export file in a text editor.

I could use some guinea pigs, so if you have a WordPress blog and are willing to share its WXR file, I can send a copy back to you as a static web site. Send me an email or comment and we'll arrange how to get the file to me.


Why not use something like HTTrack to crawl the WP site and grind out static versions of all the pages, instead of dealing with the WXR export?

I ask because I have worked with WXR files in the past, and while the format is mostly easy to understand without any reference, every time I thought I had it down I would find some new wrinkle in another blog's export file that screwed up my converter... making a WXR converter that works for a given blog is easy, making one that works for all blogs is not. (At least, that was my experience.)

I want to be able to render the blog again later with a different web design and possibly change how archive and tag pages are organized. If I just use Wget to download the site those things aren't possible.

Also, if an edit becomes necessary, I'd rather do it in the WXR file than edit HTML pages by hand.

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).