Canning Comment Spam

Workbench has been under attack lately by a comment spammer linking to dozens of cheesy .info domains. The sites sell drugs like Cialis and Phentermine, offer Texas Holdem poker, and pimp a bunch of other get-rich-click schemes.

I'm writing my own software here in PHP and MySQL, so I'm trying to deal with this abuse as painlessly as possible.

For several hours at a time, a new comment spam is being posted every 1-2 minutes on the 2,300 weblog entries on this site. In a comment management tool, I added a Delete Comment and Ban IP button, which removes the message and blocks the IP address used by the spammer.

Unfortunately, the spammer has a wide range of IPs at his disposal. In the last three days, I've banned 40 different addresses in Spain, Puerto Rico, Uruguay, and other countries. The comments are coming from new IP addresses as fast as I can ban them.

Plan B: I am now counting the number of hyperlinks in posted comments with the following PHP code:

if (substr_count(strtolower($comment), "a href") > 3) {
error_log("Attempt to post four or more links from $ip_address");
header("Location: /workbench/comment/$dex");
exit;
}

Comments with more than three links are not accepted.

Comments

I've had the same exact problem on my MT-powered blog from spammers using very similar .info domains as the one's you're seeing. My site has a very small readership and a lot fewer pages so it is less of a target, but it is interesting that the same spammer(s) seem to be targeting a wide swath of the blogosphere. For me, MT-Blacklist is doing a fine job of moderating.

I wrote a Movable Type text filter that will remove all links from a comment if a user-designated maximum is hit. I don't know if it's needed for MT-Blacklist users, since I haven't used that plug-in yet.

The latest MT-Blacklist (v2.01b) can intercept comments that contain in excess of a user-defined number of URLs. The default value is 5 URLs. It automatically holds the comment pending moderation. It does the same for comments on entries older than a certain number of days, 14 days being the default. These two features work well.

The HIP ("type the characters you see") tests tend to work very well. Some wiseguy will say, "they can be hacked", but show me a comment spammer with the chops to do that. There is an article for doing this in ASP.NET; msdn.microsoft.com I suspect there are similar samples for PHP, Python, Perl -- it would also be cool to see someone integrate something like this into Frontier.

I'd like to avoid the type-in-characters method if I can. I don't want to lose some feedback from people who think, as I do, that it's annoying as hell.

Just a thought Rogers. I frequently post on DrudgeRetort and include a handful of links. The thinkers on there sometimes demand quite the heavy load of proof. Perhaps requiring registration with more than three links in a post would help.

I use MTBlacklist 2 and its halted every single one of the spams so far. True it has also halted a few genuine comments but I keep a close eye on my e-mail and can quickly verify a true comment and also quickly use MTBlacklist to stop further attempts at spamming. IP blocking doesn't work very well as you have found out.

Having just got your new MT Bible book delivered yesterday I am a little perturbed to find that you don't use MTBlacklist! But that is your choice and from what I have read so far - only been able to do a few quick dips into sections that interest me - it looks like a great book!

Thanks. At the time I finished the manuscript, MT-Blacklist had not been updated for MT 3, so I covered other plugins.

I'm running MT on the Drudge Retort, which at 2,400 entries and 97,000 comments is running into some scaling issues.

You might want to check out Brad Choate's open proxy comment filter:

bradchoate.com

You should check out the code for this in WordPress and the latest stuff we're doing in the CVS version to address comment spam, especially if you're working in PHP and MySQL as well.

MT-Blacklist had not been updated for MT 3

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).