Akismet and Spam comments

By Shamus
on Feb 4, 2007
Filed under:
Notices

Ok, htAkismet is now enabled. So if you can’t read this you’ll know why. Er, waitasecond

Background: There are two programs we’re working with here: Akismet examines posted comments, and if they meet some secret criteria then it flags them as spam and holds them for me to moderate later. htAkismet looks at the IPs of these flagged comments, and bans some of them. Sometimes.

htAkismet is a little vague on how it works, but the docs hint that an IP address must spam me more than once before it gets banned. Still, for people with dynamic IP’s or who share an IP, this might eventually become a problem. Worse, the ban is done by adding the offending IPs to the .htaccess file, and I don’t see any way of un-banning IPs once they get banned. (I’m not going to decend into madeness and edit that sucker myself.)

Akismet has a habit of “picking on” certain readers for no reason I’ve been able to discern, and flagging their comments as spam no matter how many times I approve their comments. Now add to this the fact that htAkismet will look for repeat offenders and ban them. I’m seeing the opportunity for emergent stupidity here, with the added bonus that once it screws up I’ll have no way of knowing. And if I do find out, I have no easy way of fixing it.

I’m less and less keen on this idea.

Still, I’m hoping that if htAkismet can cut down on the volume of crap I have to sort, then I can deal with the remaining stuff by manually reviewing flagged comments. I did this for months, and only stopped once the volume overwhelmed me.

Also: My spam trap has about 2,500 comments in it. Before I enabled htAkismet I plowed through several pages of them and rescued a half dozen legit comments, so if you see comments showing up in the middle of long threads, you’ll know why.

Enjoyed this post? Please share!


is a programmer, an author, and nearly a composer. He works on this site full time. If you’d like to support him, you can do so via Patreon or PayPal.

1818 comments. (18 is the only non-zero number that equals twice the sum of its decimal digits.)

From the Archives:

  1. hmm. What happens if I specify a non-dotinfo domain as my website? let’s see…

  2. and lets now see what happens if i use my dotinfo domain…

  3. Shamus says:

    Wow. It worked!

    EDIT: No it didn’t. Arg. Thats it. The first comment made it through, and the second was flagged as spam.

    I wonder if I can hack Akismet to stop doing that. Some of these plugins can get a little hairy, but if it’s hard-coded to have a problem with .info then THAT part should be easy to find. I’ll have to investigate further.

  4. Shamus says:

    So it blindly blacklists all .info domains. That is a clumsy and hackish solution.

  5. Julia says:

    That is a clumsy solution.

    Any way to whitelist specific .info domains? If not, I know one person I won’t be recommending this site to…. :(

  6. Nick says:

    I’ve been using a Bayesian spam filter on my anime weblog since I got it up and it’s been serving me well. Yeah, there is a bit of a training period and different types of spam might find ways around it at first (and you get the occasional spammer that tries to poison the filters) which is what Akismet, as a distributed spam-filtering system, tries to avoid or at least lessen the effect of (I’m sure there’s some Akismet poisoning attempts, which would might explain why my post that linked to Dictionary.com got flagged).

    It would seem a good idea to for there to be a hybrid system. If you mark something as a false-positive in Akismet, it would be good if the code running on your site would do some white-listing or weighting on your end. I understand that Akismet only returns a “Yes/No” type answer to prevent hacking to reverse engineer the spam-checking system being used, but at least you should be able to do some spam-checking that’s more specific to your site (I.E., whitelisting Aziz, etc.).

    I actually just put up another blog that I’ll be running Akismet on, so I’ll have an idea of their performance side-by-side.

  7. Andre says:

    I settled on a two-fold spam solution that seems to work for me. First, anybody who doesn’t log in before commenting needs to have their comments approved by me. Second, in order to register or comment without logging in, you need to pass a simple captcha test. That seems to solve it, as I haven’t had any comment spam since then. I thought about turning on trackbacks, but then I decided that was a bad idea.

    Hope this works for you, Shamus.

  8. Rask says:

    If you can read this comment, then it is infinitely less interesting, because it’s not one of the comments blocked by your spam filter.

  9. Shamus says:

    “It would seem a good idea to for there to be a hybrid system. If you mark something as a false-positive in Akismet, it would be good if the code running on your site would do some white-listing or weighting on your end.”

    Yeah, that would pretty much be a perfect solution for me.

  10. GreyDuck says:

    So far, Bad Behaviour 2 and Spam Karma 2 have been quite effective for me. Very little even gets through to the “please moderate” stage.

  11. I can sort of understand why; dotinfo domains are selling for 1 euro for the 1st year, so the spammers are probably all over them. The cheap price is awesome because you can create a website in literally minutes (if youve alreadygot ahosting provider). The cost just is negligible.

  12. Mark says:

    I’d love to see someone do a comparison between Askimet and Spamlookup (Movable Type’s default spam filtering plugin). A quick googling does not yield any obvious results, perhaps because most people use either Movable Type or WordPress, but not both.

    My impression, from reading people’s complaints about both, is that Spamlookup is slightly better. As I’ve never seen Askimet in action and don’t know how it works, it’s difficult to say. My experience with Spamlookup is great, though, and I’m not totally sure I even understand how that works. But work it does.

    Again, I don’t know how Askimet works, but Spamlookup leverages MT’s commenting moderation, and adds an extra step called the “junk” folder. Things that are obviously spam (obvious even to a computer) are simply shunted to the junk folder. Things that are more borderline (again, to a computer – to a human, these are almost always obvious spam) are moderated. The result being that I get some moderated comments and some false negatives, but not a lot. And I’ve only ever had one false positive. The great thing about this system is that, for the most part, it’s transparent. I don’t generally muck around in the junk folder unless I know someone’s comment isn’t posted (which, again, only really happened once), and the false negatives are relatively low volume (though there is an occassional storm of spam).

    However, I guess I should mention that I automatically close comments on posts older than 60 days (in some cases, this is extended), so my pool of commentable posts is probably a lot smaller than yours…

    Again, I’m not sure how Askimet works, and htAskimet does indeed sound like a bad idea, but then, when I first heard about a spam filter that used DNS lookups and ip addresses, I didn’t think it sounded right either… but that’s a big part of Spamlookup, and it seems to work fine…

  13. Nick says:

    Well, if you want to learn about Askimet, go to the Askimet Website (Irony points here if this post gets flagged as spam…). :)

    The client end is rather simple. From the API documents, when someone posts a comment (or a trackback, etc.), the web server will send the information from that comment (email address of commenter, website put in by commenter, IP address, comment itself, etc.) to the Askimet server which responds with a simply “true” if the comment is detected as spam, “false” if it’s not, or some kind of error code if there’s a problem with the submission.

    Unfortunately, what happens on the Askimet server side is a black box. How they come up with “spam” or “not spam” for a given comment is only kept internally.

  14. Cineris says:

    Re: GreyDuck

    I had some bad problems (Read: Significant numbers of people completely unable to access the blog I had installed it to) with Bad Behavior when I tried it out awhile ago. Although my problems were probably particular to my given install, having half of a site’s potential visitors denied access is definitely something that makes me wary of using it again.

  15. SteveDJ says:

    I’m not sure what all this Akismet/htAkismet stuff is, but could it be what is causing this new line at the bottom of all the pages here? Currently, it says “Bad Behavior has blocked 673 access attempts in the last 7 days.” (and Bad Behavior is a link to another site).

    That wasn’t there before. Is that something that everyone is supposed to be able to see?

  16. Ivan says:

    Akismet isn’t perfect, unfortunately. I should know. It even marks my comments as spam sometimes. There’s not really much you can do to Aksimet about this, but you should also be running Bad Behavior, to keep out the spammers before they even hit your site at all. (Which means there’s far less junk to view in the Akismet Spam page.)

  17. buy domains says:

    Magnificent goods from you, man. I have understand your stuff previous to and you’re just too magnificent.
    I actually like what you’ve acquired here, really like what you are stating and the way in which
    you say it. You make it entertaining and you still care for to keep it wise.
    I can not wait to read far more from you. This is really a great web site.

  18. Blue_Pie_Ninja says:

    “(I’m not going to decend into madeness and edit that sucker myself.)”

    Yeah, Shamus, I’m pretty sure you will need to edit this post though, if you still care to dig into the archives and fix spelling errors.

Leave a Reply

Comments are moderated and may not be posted immediately. Required fields are marked *

*
*

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun.

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>