Detecting Weblog Spam with Comment Flak

Because I don't want to add captchas to Workbench, this weblog has been drowning in comment spam. Since I began accepting comments in September 2002, I've received 13,000 legitimate comments and 172,000 spam.

I'm trying a new technique this week that makes spam easy to detect by putting a bunch of bogus text areas on a weblog form, hiding them with Cascading Style Sheets, and checking them for input when the comment is submitted. I call these fields comment flak.

Spammers typically put their junk comment in every text area on a form. When text shows up in any of these flak fields, my blogging software treats it as spam.

I've written a new Comment-Flak library for PHP that makes it easy to use this technique on any weblog published with PHP.

So far, 100 percent of the spam submitted to this weblog has been caught by this technique. This will drop if the technique becomes popular, but I'm hoping people will offer tips on how to make it harder to beat. The code has been released as open source under the GPL.

Comments

Now isn't that clever. Of course any system that becomes popular because it is successful will draw the effort necessary to insure its defeat -- perhaps releasing the code was a mistake? Even talking about the technique?

I have a similar system hacked into my Wordpress install, and it has been very successful so far (in use for more than a year).

Kudos for resisting captchas :)

I like the Akismet plugin for WordPress. It catches almost all of my comment spam. I haven't seen one slip by for a few months now.

I was going to try Akismet, but it kept rejecting my attempts to comment on Aaron Swartz's blog, claiming they were spam.

Please don't say, "Victory is mine", because then either God or Dave Winer will say, "Vengeance is mine."

the Spam Karma 2 plugin for wordpress is an excellent collection of various spam triggers / detection mechanisms.

Highly recommended

Great product and idea!

nice work .hopefully it would be hard to detect

Does it actualy work
Seems not.

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).