W3C Serves 130 Million XML DTDs Per Day

A Java application I wrote that reads several dozen RSS feeds started running into trouble with the W3C. Feeds failed with HTTP 503 "Service Unavailable" errors like this one:

Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

At first I thought this was a temporary error. HTTP 503 errors are defined to indicate that a server is temporarily overloaded or undergoing maintenance.

However, the W3C Systems Team announced in February 2008 that they were dealing with so much traffic for their XML DTD files that they were using 503 errors to deal with bandwidth-hogging XML clients that request the files too often:

... we receive a surprisingly large number of requests for such resources: up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven't changed in years. ...

A while ago we put a system in place to monitor our servers for abusive request patterns and send 503 Service Unavailable responses with custom text depending on the nature of the abuse. Our hope was that the authors of misbehaving software and the administrators of sites who deployed it would notice these errors and make the necessary fixes to the software responsible.

But many of these systems continue to re-request the same DTDs from our site thousands of times over, even after we have been serving them nothing but 503 errors for hours or days.

Although the problem went away for reasons I don't yet understand, I'm looking for a way to read local copies of the XML DTDs with the XOM Java XML library. XOM doesn't yet support XML Catalogs, an XML standard for handling this kind of issue.

Huffington Post Censors Jesse Ventura on 9/11

A March 9 commentary submitted to Huffington Post by former Minnesota governor Jesse Ventura was removed after publication by the site, which replaced it with a note stating that contributors are banned from engaging in conspiracy theories:

Editor's Note: The Huffington Post's editorial policy, laid out in our blogger guidelines, prohibits the promotion and promulgation of conspiracy theories -- including those about 9/11. As such, we have removed this post.

"I can't believe the Huffington Post today will practice censorship. I've got news for them," Ventura responded to the action. "I won't ever write for em again."

I get tired of a lot of the conspiracy stuff posted by users on the Drudge Retort, which gets 2-4 posts a day from Infowars, Prison Planet and similar sites, but I've never banned it. I know it's difficult for Huffington Post to deal with fringe stuff -- the conservative group blog Red State kicked off birthers and truthers last month -- but the Post is doing a public disservice by allowing no discussion at all on a subject. Ventura is a former governor. When prominent people challenge the government, the idea that their views should be censored on the grounds they are a "conspiracy theory" is antithetical to open debate in a free society. Any far-out idea could be dismissed as conspiracist. Would the Post have censored Jim Garrison from writing about the Kennedy assassination? The site is running Jenny McCarthy's dangerous autism vaccine quackery, a view widely discredited by medical experts.

To combat the censorship, I republished Ventura's censored 9/11 commentary yesterday and gave it major news banana treatment on the Retort:

You didn't see anything about it in the mainstream media, but at a recent conference in San Francisco, more than 1,000 architects and engineers signed a petition demanding that Congress begin a new investigation into the destruction of the three World Trade Center skyscrapers on 9-11.

That's right, these people put their reputations in potential jeopardy -- because they don't buy the government's version of events. They want to know how 200,000 tons of steel disintegrated and fell to the ground in 11 seconds. They question whether the hijacked planes were responsible or whether it could have been a controlled demolition from inside that brought down the twin towers and WTC Building 7.

His views aren't faring too well in the Retort discussion. But they deserve to be heard.

Boston Herald: Alabama Shooter Played D&D

20-sided dieA story I missed last month: After University of Alabama-Huntsville professor Amy Bishop was arrested for shooting up her faculty department, Boston Herald reporter Laurel J. Sweet blew the lid off a shocking angle of the crime: Bishop was an avid player of role-playing games.

Accused campus killer Amy Bishop was a devotee of Dungeons & Dragons -- just like Michael "Mucko" McDermott, the lone gunman behind the devastating workplace killings at Edgewater Technology in Wakefield in 2000.

Bishop, now a University of Alabama professor, and her husband James Anderson met and fell in love in a Dungeons & Dragons club while biology students at Northeastern University in the early 1980s, and were heavily into the fantasy role-playing board game, a source told the Herald.

"They even acted this crap out," the source said.

I didn't think the press was still capable of anti-D&D hysteria like this. Back in the '80s, Joe McGinness wrote a ridiculous true-crime book on some murderer who blamed D&D for his crime, Tom Hanks starred in the anti-D&D TV movie Mazes and Monsters and grieving mother Patricia Pulling began the scare group Bothered About Dungeons and Dragons, blaming the game caused her teen-age son's suicide.

But these days, D&D and role-playing games are about as controversial as Yahtzee. Millions of people played the game as kids and grew up without worshiping the occult or committing murders. The deaths of game creators Gary Gygax and Dave Arneson in recent years were major news covered across the globe, sparking countless remembrances by people who huddled around a table with dice when we could've been experimenting with drugs and alcohol. (I was going to say drugs, sex and alcohol, but who am I kidding?) Today, millions of people play MMORPGS and other videogames that are D&D in everything but name.

Like out and proud D&D geek Stephen Colbert, I was a dungeon master in my youth (and not the cool kind who wears assless leather chaps and ties women up on torture wheels in my basement). I've managed to reach middle age without killing anybody at all, not even a single drifter or truck stop prostitute.

The Heavy: David Letterman Likes Them Now

On Jan. 18, the British band The Heavy impressed David Letterman so much with their song "How You Like Me Now?" that he did something he's never done before in the history of his program -- he asked for an encore.

The YouTube video is the televised broadcast -- which edits out most of the encore -- but you can see it in full in high quality on Letterman's web site. Paul Shaffer and Letterman even perform part of the encore.

There have been some great live performances on Letterman, including TV on the Radio's Wolf Like Me and Phoenix's 1901, but that one tops them all.

Google Flags MSNBC.Com as Malware Site

I was reading news stories this afternoon on MSNBC when one of its pages triggered a malware warning in Google Chrome:

The website at www.msnbc.msn.com contains elements from the site adrotator.mediaplex.feed-mnptr.com, which appears to host malware -- software that can hurt your computer or otherwise operate without your consent. Just visiting a site that contains malware can infect your computer.

According to Google's safe browsing alert for that feed-mnptr.com domain, it has contained three trojan programs and five browser security exploits. The domain has been used as an intermediary to infect users of Digg, CNBC and MSN.Com.

I can't check without visiting the MSNBC page, which would be extremely dumb, but based on the domain the malware appears to be coming in from a third-party ad service. There was a report Wednesday that the Drudge Report had hosted malware, probably from an ad network.

AP Keeps Accused Rapist's Name Secret

The Associated Press reported today on a 51-year-old New Jersey man facing trial for raping five of his daughters, three of whom allegedly bore his children from the assaults. He faces 27 charges including sexual assault, child endangerment and criminal sexual contact, but the wire service has decided not to name him in its coverage:

The Associated Press generally doesn't identify victims of sexual crimes and is not reporting the names of the husband and wife to protect the identities of their children, now all over 18 years of age.

The longstanding media policy to shield some crime victims from being identified has always been a questionable one, since people who suffer rape aren't the only victims who might be harmed by the publicity generated by a trial. Here, though, the policy has been extended to the perpetrator of a crime.

I question whether in a free society it is acceptable to put someone on trial and potentially imprison them while never revealing the person's name to the public. What if someone has information pertaining to the accused that ought to be known to police? What if other victims are out there who might never know to come forward unless told of the arrest?

In any case, the web has made it considerably more difficult for information of this kind to stay secret. The New Jersey Star-Ledger and New York Daily News identify the accused rapist as Aswad Ayinde.

Deterring Spammers with Fake MX Records

For the past 48 hours, I've been dealing with a Sendmail server that was shutting down frequently with a load average above 13. The server's getting flooded constantly with spam attempts to non-existent users on more than 100 domains.

I've set up Sendmail to use a virtusertable that rejects every non-valid email address with a "user unknown" error. This is helpful, but Sendmail still has to take the time to reject each spam attempt. Since all but six domains on the server don't receive any mail at all, I wanted to find a way to stop Sendmail from receiving any requests for those domains.

After doing some research, I decided to try setting a fake MX record for the domains that do not send or receive mail. Here's how MX records are set for these domains:

IN MX 10 mail.example.com.

There's no mail server associated with that hostname.

On servers that do exchange email, fake MX records can be used to deter spammers. Most email servers are equipped to deal with mail servers that are unavailable. They queue the outgoing mail and try an alternate mail server, if one has been defined for the domain. Spam software can't take the time to queue an outgoing mail for delivery later because it is sending millions of messages. If it finds a mail server that's unavailable, it gives up and goes on to the next server.

Putting fake servers as the first and last MX record in a domain supposedly discourages spammers without affecting the receipt of legitimate email. Spammers hit the fakes and give up. Legitimate mail servers hit a fake, then try the next option and deliver the mail.

Here's how MX records can be set to achieve this:

IN MX 10 mail1.example.com.
IN MX 20 mail2.example.com.
IN MX 30 mail3.example.com.

The mail1.example.com and mail3.example.com servers are fakes that don't resolve properly. The functioning mail server is at mail2.example.com.

So far, the approach appears to work. Legitimate email is getting through and most domains aren't getting any spam attempts at all.