Akismet vs. Two Billion Spam

 By Shamus Jul 11, 2007 24 comments

Akismet, the anti-spam WordPress plugin, has been around since November 2005. In that time, the software has dealt with 2 billion spam messages. What’s really alarming is the shape of the curve. To be fair, some of the curve is the result of more and more people using WordPress, and more of those people getting Akismet, but still.

Such a mammoth waste of everyone’s time and energy for just a tiny bit of money for a miniscule number of people.

The spam solution I’m using is still going strong. It’s been 2 weeks since the last time I saw a spam. It’s been over a month since one slipped by that I had to delete manually. Given the sheer volume of spam I was getting five months ago, and given the fact that this site is several times larger now, I’m very grateful for how well the CAPTCHA is working.

If you look at the problem from the POV of the spam programmer, there are many ways to make his job harder and more annoying. You can’t make it impossible, of course, but the appeal of spam has always been the fact that it is “free” for the spammer. Making it less free might go a long way to making less of it. Given the normal level of lazyness and stupidity of the average spammer, I think that even CAPTCHA are probably overkill.

Most spam scripts go right for the wordpress comment-posting script. Just having this script to have a configuarable name would probably be just as effective as the CAPTCHA solution I’m using now.

Another technique would be to simply insist that comment POSTS are the result of an honest-to-goodness page load. Embed a secret number (which changes automatically) into the form as a hidden field, and make sure incoming form submissions contain the number. The advantage of this would be that it would be seamless and transparent to normal users – they wouldn’t even need to enter a CAPTCHA. The only downside would be if a user loaded the page, and then did something else for a couple of hours, and then came back and left a comment on the open page without reloading it first, then their number would have expired and the system would eat their comment. The disadvantage for the spammer is that they will have to parse all that HTML on the page if they want their comment to get through.


20424 comments. Hurry up and add yours before it becomes passé.


  1. Deoxy says:

    If they didn’t expire for 24 hours, that would solve that problem.

    And yes, spammers, generally, need to be beaten within an inch of their life… notice that “within” could be plus OR MINUS and still be in the margin of error…

  2. Marmot says:

    2 bilion!!!

    Incredible. Well, I realize that WordPress as such is a big institution/website/however you want to call it, but that amount is still staggering. I wonder if there are stats for that somewhere worldwide?

    It’s great to see your solution works so well; I shudder from trying to imagine a fragment of those 2 billion heading over to here…

  3. AngiePen says:

    I get comment spam even on my LJ, which is just weird. What’s even weirder, and proves your statement about how stupid the spammers are, is that inevitably the spam appears on posts which are weeks or months or even a year or more old. No one’s reading that stuff anymore so what’s the point for the spammer? [eyeroll]

    It looks like they’re going for posts which have a lot of incoming links but if the topic has exhausted itself and all the readers have moved on, what are the spammers getting out of that? It’s like tossing flyers in the window of an abandoned house. Idiots. :/

    Angie

  4. Gropos says:

    Insert girder

    Insert girder

    Insert girder

    I read a story on how one of the top 10 email spammers worldwide was caught and there was a noticeable GLOBAL impact on the speed of the intratubes.

  5. RodeoClown says:

    [I]f the topic has exhausted itself and all the readers have moved on, what are the spammers getting out of that?

    They are getting google rankings. Readers may have moved on, but google keeps re-checking the pages, and the links TO the spammer’s site give it a higher page rank and move it up google’s result list.

  6. Tarlen says:

    My forum used to get about 3 or 4 spam bots signing up every day, despite having a captcha on the registration screen.

    I “improved” the captcha, and the number of bots signing up didn’t change, but I did get mails from users saying they couldn’t read it.

    So I added a question to the sign up – Are you human? Haven’t had a single piece of spam since then.

    I think they tend to work on known principles. phpBB is a well known target, with well known anti-spam techniques. If the captcha method you use became extremely popular, and the install was always the same, I think it would start letting stuff through much more.

    The trick is to make it subtly different on every site. Hopefully the spam-bots can’t easily be made to recognise those differences.

  7. Mario says:

    Couldn’t you set the page to reload every couple of hours automatically? Then the only people harmed would be the ones that wrote a comment, didn’t post it, and then left the page open for hours.

  8. Luke says:

    Shamus, there are plugins that already do this. I can’t recall them from the top of my head, but I think lately I saw 2 or 3 existing ones that rename the comment fields to throw off the spammers.

    Also, the secret number thing also has been done:

    http://wordpress-plugins.feifei.us/hashcash/

    I think HashCash computes a hash of the comments post on the client side via Javascript, and plugs the result into a hidden field. Requiring Javascript to post is a usability issue, but then so is CAPTCHA so I guess it’s a fair trade of.

  9. I installed Spam Karma 2 on my wordpress blog towards the beginning of the year. Since then it has caught 442 spam messages with no false positives or negatives.

  10. Heh, had a similar problem with bots wanting to up their pagerank registering on our forums with suspect sites as their website (we use phpBB2, so there was a risk to start with). We managed to solve this by simply telling everyone to sign up with student as their occupation, and modifying the php to reject any other value – if you can just slightly modify popular software in some way, spam vanishes unless someone actively targets your site

  11. Alden says:

    I inserted decoy comment forms into my source code, which seems to throw off a lot of spammer’s scripts. The few spams which get to the right place tend to be promptly eaten by the anti-spam plugin I have (SpamLookup for Movable Type).

  12. Hal says:

    Y’know, I noticed the expiration problem with the CAPTCHAs in blogger. I have them enabled for my blog, and they’ve cut down on the spam. (To think that I was getting spam? Who am I?)

    The thing is, if I’m writing a long comment, the captcha on-screen has usually expired by the time I’m finished with my comment. Necessary, perhaps, but annoying all the same.

  13. There’s an even simpler captcha solution for the wiki I’m maintaining (Oddmuse): It’s just a field that asks a plain-text version and the answer is checked when you submit. Works for visually impaired people, too. And the only question I ever ask on my own blog is “Tell me a number between one and ten,” and I accept any one of the ten words and ten number as a correct answer. Works for me.

  14. Roxysteve says:

    I like the bot-prufe method which displays a number of pictures and asks for the choice of one based on human reactions to them. Two frownies and a smiley, say, and a question like “which one just won the lottery?”

    Way way OTT for most purposes though.

    I was astounded by how successful Shamus’s elegant capcha was.

    Steve.

  15. AngiePen says:

    RodeoClown — of course. [facepalm] I didn’t think of that. Luckily I’m online most of my waking hours (well, maybe it’s not that lucky but anyway…) so I catch and delete them pretty quickly, often within minutes.

    Angie

  16. Hal–is *that* what’s going on? I’ve wondered why sometimes my comments on blogger get spat out; luckily if I hit the back button and resubmit it generally works, which again never made sense to me before.

  17. Author says:

    A lot of what we discuss here has to be normalized on the attractiveness of the spam target. For example, spammers employ humans to break captchas when they think it’s worth the trouble. As Shamus’ site climbs popularity ranks, it is bound to get more attention. Actually, I’m surprised this is not happening already.

  18. Anachronda says:

    Heh, had a similar problem with bots wanting to up their pagerank registering on our forums with suspect sites as their website (we use phpBB2, so there was a risk to start with). We managed to solve this by simply telling everyone to sign up with student as their occupation, and modifying the php to reject any other value – if you can just slightly modify popular software in some way, spam vanishes unless someone actively targets your site

    I modified the forum I run for my WoW guild to reject registration requests that include a web site. Once registered, a user can still modify their profile to add a web site, but can no longer specify one when they create their account. Haven’t had any trouble since.

  19. Miral says:

    The one I don’t get is when I receive link-spam through my private feedback form. This just emails directly to me, it doesn’t display on the website or anything, so what’s the point?

    Anyway, there’s already a system in WP that detects potentially evil uses of admin pages (“nonces”), so this could probably be extended to the comment-posting pages without too much difficulty.

  20. Craig says:

    I wonder how much the number of easily spamable/orphan blogs have increased in the same time period. My guess is that while more bloggers are getting better tools, the number of blogs is increasing far more rapidly so in the long run I guess the spammers are winning the battle.

    Thankfully while they may be reaching their goal (increased pagerank, etc), the blogs that I read are getting less and less spam. It’s almost win-win, if spammers winning was desirable.

  21. Matt` says:

    I think I heard somewhere that if you hired a load of Chinese people to post spam it would cost less than the normal methods

  22. Rob says:

    I had the same idea, Shamus and it worked fine (although it took a couple of tries to get it right). I have access to both the server-side and the client-side code for my web application, however, which is a different situation than someone using WordPress.

    Deflecting Comment Spam

  23. Sean Hagen says:

    I was having some major spam problems on my blog ( each post had about 270-odd spam comments ) until I re-wrote part of the commenting system ( my entire website is self-written ).

    The first layer of defense is a system I came up with after seeing the image chooser used by CSS Squirrel. I took that idea and ran with it: added a few more images, but the three are randomly chosen when displayed.

    I also started using Akismet. Two layers of protection baby, yeah!

    Anyways, since I put those updates in place, haven’t had a single spam comment.

    What I found interesting about the spam comments though, was that the email address the spammer was inputing was the one found on my ‘About Me’ page.

  24. Hello, I believe your website might be having internet browser compatibility
    problems. When I look at your web site in Safari, it
    looks fine but when opening in I.E., it’s got some overlapping issues.

    I just wanted to give you a quick heads up! Apart from
    that, wonderful blog!

Leave a Reply

Comments are moderated and may not be posted immediately. Required fields are marked *

*
*

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun.

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!