W3C Serves 130 Million XML DTDs Per Day

A Java application I wrote that reads several dozen RSS feeds started running into trouble with the W3C. Feeds failed with HTTP 503 "Service Unavailable" errors like this one:

Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

At first I thought this was a temporary error. HTTP 503 errors are defined to indicate that a server is temporarily overloaded or undergoing maintenance.

However, the W3C Systems Team announced in February 2008 that they were dealing with so much traffic for their XML DTD files that they were using 503 errors to deal with bandwidth-hogging XML clients that request the files too often:

... we receive a surprisingly large number of requests for such resources: up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven't changed in years. ...

A while ago we put a system in place to monitor our servers for abusive request patterns and send 503 Service Unavailable responses with custom text depending on the nature of the abuse. Our hope was that the authors of misbehaving software and the administrators of sites who deployed it would notice these errors and make the necessary fixes to the software responsible.

But many of these systems continue to re-request the same DTDs from our site thousands of times over, even after we have been serving them nothing but 503 errors for hours or days.

Although the problem went away for reasons I don't yet understand, I'm looking for a way to read local copies of the XML DTDs with the XOM Java XML library. XOM doesn't yet support XML Catalogs, an XML standard for handling this kind of issue.

Comments

When you create a XOM Builder, pass in a org.xml.sax.XMLReader with a custom EntityResolver - the javadocs have examples.

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).