Meet the Moderator

By Shamus Posted Sunday Jun 15, 2014

Filed under: Rants 175 comments

We have a problem. The problem is so old and so commonplace that we’ve all gotten used to it. But it’s still a problem. The problem is that the WordPress moderation filters are comically primitive. They’re not even up to 1997 email-filtering standards. In fact, I’m starting to suspect that the spam filter is just a random number generator that marks every 20th comment as spam.

Observe:

comment_moderation5.jpg

On the top we have Henson, who has posted a small comment that contains no common spam keywords. This was posted to the most recent episode of Spoiler Warning. It contains no links. Moreover, Henson has successfully left 64 comments in the past without being flagged as a spammer.

On the bottom we have “residential steam showers”. It’s also worth noting that:

  1. This “person” has never commented before.
  2. This comment was left on a post that is half a decade old.
  3. It is loaded with spam phrases that I have marked as spam again and again and again. (What is with you spammers selling showers and bathroom fixtures? Even if I left every single comment stand, your spam would NEVER build up enough search engine credibility to end up anywhere NEAR the top of the search results. It will never happen. Give up.)
  4. It features a long gibberish URL, which is a common trait among spammers.

But Henson was inexplicably marked as spam, and not residential steam showers. Then we have this:

comment_moderation3.jpg

ps238principal has successfully left ONE THOUSAND SIX HUNDRED AND THIRTY-EIGHT non-spam comments. Yet the spam filter felt the need to flag this reasonable, inoffensive comment as spam.

comment_moderation4.jpg

On the top, “real estate” is leaving a word-for-word reproduction of a comment I’ve marked as spam a hundred times in the past. On the bottom, “hack les simpson” is leaving a comment with goofy manual line breaks that are common to 80% of all spam and is never done by any human ever. They’re also loaded with phrases that are very common to spammers. (Seriously, spammers love to tell me how nice my site looks. Also they love to use the word “fastidious”. Incorrectly. As in, “this post is in fact a fastidious one it helps new \n net visitors, who are wishing for blogging”.)

comment_moderation1.jpg

I guess it flagged ET because the comment had two links? But they’re to youtube, just like the spam below it. And ET has nearly 700 valid comments. “Doctor Oz” has zero, plus goofy line breaks and spammy content. And for the record, the “1 comment approved” means THIS comment. It doesn’t mean I’ve approved a comment from them in the past.

Also, I am reminded how nice it is to be rid of the Google Adbot. I can write all this without worrying about pissing it off or being paranoid about what ads it will choose based on my content.

OBAMA DOESN’T WANT YOU TO KNOW THIS TRICK FOR ONE CLICK UNDERAGE PAYDAY LOANS FOR FAST WEIGHT LOSS SHOWER HEADS, NO PRESCRIPTION REQUIRED!

(Diabolical laugh.)

comment_moderation2.jpg

Tell me again about the great strides we’re making in artificial intelligence. </facepalm>

This is beyond pathetic. If you can’t recognize these three flagrantly obvious spam comments as spam, then you have not written a spam filter. I don’t know what your software is doing, but it sure as hell isn’t looking for spam. Once again: Steam showers and sex toys with goofy line breaks and sketchy URLs on ancient posts.

This is the tyranny we live under. Our spam filter is like an airport security checkpoint that waves through men in sunglasses with ticking briefcases that have giant nuclear symbols beside digital countdown timers. But then the guards body tackle and strip search little old ladiesSo, not all that different from real airport security, really.. It would be one thing if it looked like a slightly buggy system that missed every once in a while, but this is so bad I can’t even tell what it’s using as criteria for spam.

Even more embarrassing: This circus of failure is actually the result of three spam filters: Akismet, GROWMAP, and Bad Behavior.

And to be completely fair: Yes, they do catch more than they let through. My comments would by 90% spam without them. Also, Growmap doesn’t do filtering based on content. It just puts the “Confirm you are not a spammer” checkbox in there. So it really just cuts down on the volume of crap the other two have to cope withWhen I first installed it, Growmap worked like magic. No spam for weeks. But spammers always adapt..

I could tolerate the occasional spam getting through. But what I can’t fathom are these false positives. There is no pattern or reason to them.

So if you’re curious why sometimes your harmless comment was put into moderation, now you know: NO REASON WHATSOEVER.

 

Footnotes:

[1] So, not all that different from real airport security, really.

[2] When I first installed it, Growmap worked like magic. No spam for weeks. But spammers always adapt.



From The Archives:
 

175 thoughts on “Meet the Moderator

  1. Johnny Spam says:

    I’m very glad Darth Vader isn’t my father.

    1. Blake says:

      Darth Vader would make a sweet dad!
      Outside of inheriting force powers, he could totally get you all the best gifts.

      1. Felblood says:

        –or at least help you defraud your health insurance for a cybernetic hand.

      2. James says:

        But Force powers, arn’t necessary hereditary, Revan and Bastila’s son wasn’t force sensitive, though his Grandkid was.

        On the topic of the post, man am i glad i don’t run a site with comments jesus!

        1. Kylroy says:

          Yeah, Comments Jesus is a real bear to work with.

          1. Comments Jesus says:

            …I’m right here, y’know.

            1. Lifestealer says:

              Everyone hide! He’s on to us!

              1. Duneyrr says:

                This site is the only site on which I read comments. This is why.

    2. It looks pretty fun in the books
      “Darth Vader and Son” and
      “Vader’s Little Princess”
      (both pretty hilarious)

  2. McNutcase says:

    I’ve been assuming moderation is triggered by phase of the moon weirdness for years now. Nice to see confirmation that the spam filter is indeed completely deranged.

  3. GEBIV says:

    Blah blah blah…. all I hear is vikings singing “Spam spam spam spam spammity spam.”

    I may have issues…

  4. boz says:

    Can you write a growmap alternative?
    Put two checkboxes one visible one not
    Confirm you are not a spammer? accept ticked comments
    hidden: Confirm you are not a spammer? accept non ticked comments

    unless they are specifically targeting your site that should stop generic growmap bypass methods.

    1. ET says:

      What about a system, where it randomly assembles a logical problem for a human to solve? Then have a multiple-choice type area, with four subtly different answers. Obviously, check-mark boxes, in case the bots are still dumb enough to check them all. ;)

      Is ReCaptcha any good? I know it was working as intended for a while, but I hear rumors every once in a while, that it’s either beatable by bots, or by click-farm type things in like [name of country here] where peoples’ wages are cheap enough to buy for pennies.

      1. Cybron says:

        I dunno if it’s beatable by bots, but I can verify that it can and has been beaten by spambots. I think they’re the second kind, where people manually solve them for pennies.

        It DOES shut down a lot of spam though. It can be very annoying to fill out though.

      2. ReCaptcha is slightly disturbing now that the project to OCR just about every book out of copyright (and then all the ones in it thanks to Google’s excellent lawyers) has finished and now we’re all being used to help identify house numbers on photos of streets so that future generations can be more accurately targeted by the first strike of Skynet (why aim a missile at the postcode GPS tag when you’ve got the house identified precisely in a photo?).

        Of course, I have to assume that the hackers, crackers, and pirates realised what ReCaptcha meant as a collective work to defeat OCR errors and protect content from automated machines. Every time someone enters a Captcha to download a questionable link or sign up for a forum to discuss cracking and piracy then I’ll assume that website is talking to a evil(er? See above about house numbers) service like ReCaptcha, only one that is fed images from ReCaptcha and wants to turn them into the right response so it can piggyback that to submit some spam or try entering in a new bank account login without human intervention. Use the humans who are proving they are human on piracy sites to remove the need to be a human to defeat a Captcha to prove you’re not a bot submitting spam on a site that thinks it is protected. Paying pennies for this done via Human Turks? I can fing you thousands of people who just want to crack an executable who will be happy to do the Captcha work for free!

    2. Volfram says:

      I like the two-checkbox idea.

      My blog has manual approval for all users. Once you’ve been approved, you can post as much as you want(unless I unapprove you).

  5. ET says:

    The really funny part, is that my comment was flagged before I edited it, when it only had one link to YouTube. Plus, I can’t even see a pattern sometimes when I’m flagged. Like, about 2/3 of the time, it’s when I use some keyword or too many links, but the rest…as you said – totally no reason. :)

    I have an idea, although it might be a bit drastic – disable in-post comments, and then just auto-generate a post in the forums. It’s how the Ghost blogging software does it.* It’s the new sexiness! Although I imagine you’d just want better filters. :P

    * Technically, Ghost has no commenting features whatsoever, but there’s a few plugins which auto-gen the forum posts, for a couple popular forum softwares. :)

  6. Fabrimuch says:

    yes very fastidious website, nice design.
    what abbot steam shower here http://www.com

    Just kidding :P

    I will never stop wondering why spammers write so poorly. Their system would work much better if they wrote in actual English syntax.

    Seriously though, I’m amazed THREE spam filters can fail that hard. How do they work anyway?

    1. ET says:

      Probably because it’s hard to Madlib an existing English sentence, and still have it sound correct. It’s much easier to write horrible-sounding quasi-English. ^^;

    2. Jay says:

      They aren’t looking for smart, skeptical customers. They’re looking for the gullible, and they write their ads appropriately.

    3. Daemian Lucifer says:

      Damn you,I wanted to do that joke!

    4. McNutcase says:

      The common spam patterns are flagged by now, so they try to defeat that by running everything through an auto-substituting thesaurus filter. Which leads to utter absurdities, because synonyms are really freakin’ hard to automate.

      1. DIN aDN says:

        Damn it, ninja’d on realising why spam would have the word ‘fastidious’ in it.

        Still, you have to admit it’s a just the alcohol, please trick.

      2. Volfram says:

        I wonder how difficult it would be to run an aggregator to check how often particular words are used. Run the aggregator on a sentence, and if you find too many highly uncommon words, flag for spam.

        1. There may be a constituency among us, albeit meager and scant, who would be discommoded by such a schema. Sesquipedalians also possess affect. If you lacerate us, do we not exsanguinate?

    5. I was under the impression that no spammers “write” anything. It’s all Markov Chains that put together almost coherent sentences from what they read on the site or from similar pages.

      Here’s a non-spammy example of them in use: Garkov, Garfield strips generated by applying the Markov model to old strip scripts and putting the result in Garfield comics. I’m sure the syntax will be familiar.

      As for successfully identifying spammers, I’m not sure there’s currently a solution beyond having a meatbag look at the incoming posts. I’ve kinda-sorta figured out when a post of mine is going to get flagged for review (mentioning the metaphysical, recreational chemicals, too many links, etc.), but sometimes I do wonder if the TSA is in charge of randomly checking my virtual shoes. :)

      1. Andrew_C says:

        What I find hysterically funny is that the Garfield strips just read like randomly generated garfield strips, but the Big Lebowski and particularly the X-Files ones actually feel right, but parhaps that’s because I’ve been reading too much of Shanon Garrity’s Monster of the Week http://www.shaenon.com/monsteroftheweek/

        EDIT speklibng

        1. Asimech says:

          That thing you linked to is the best thing.

  7. Morgan says:

    This blog is much informative, gives guidance!

    Buy showercurtain!

    http://www.buyshowercurtains.com/ahduwh139d*9)^2#j20470kdnvkalqwlu^(2j0u76%*9hdsajkn

  8. Paul Spooner says:

    Aww, I was hoping the last line of the article was going to be something like:
    “So, that does it. I’m writing my own!”

    Pretty please?

    1. Veylon says:

      I second this. Your programming projects are always fascinating and informative. I’d love to see you take a good solid swing at this program.

      1. Ingvar M says:

        Spam filtering is one of those “never-ending” tasks that isn’t very fun, once the initial framework has been laid down. Because at that point, it all boils down to “why that and not that?” and tedious reverse-engineering of mountains of numbers. And anytime you tweak anything, you upset the whole spamcart and need to pick up all the pieces and put them back together into a semblance of a filter, again.

        Admittedly, I’ve only been peripherally involved in the writing of spam filtering (but, I ran a usenet server, back when the Canter & Siegel green card spam started making the rounds).

        1. What about Bayesian?
          (Isn’t that what they call those things where every time you flag something as spam, or not spam, it learns to do it better? What that has to do with Bayesian statistics I’ve never understood, but they seem to call it that)

    2. Thomas says:

      I have no clue of the scope of such a project, but that was the line I too hoped to hear =D

  9. I now feel slightly less bad about giving up on trying to decode the spam triggers for this comment system. “NO REASON WHATSOEVER” seems to be about where I’d arrived at when I gave up, realising that there seemed to be very little linking which comments were held for moderation and which weren’t. My occasional propensity for linking to external sources (I blame 15+ years of exposure to a master of anchor use) often fall foul of other filter systems but this one seems rather more random than correlatory on factors like links.

    1. Chris Kerr says:

      Somehow I knew where that link was going without looking. Spooky.

      (If it saddens you that Dan hardly ever posts anymore, you can find his everyday ramblings in his Reddit comments)

  10. Adalore says:

    I remember one of my comments being caught by the moderation thing. I was like “Ah whatever.” :p

  11. Daniel says:

    I would like to apologize to Shamus for the confusion, on behalf of all the people leaving fake-spam joke comments.

    1. Rick says:

      Hey that’s cool man.

      By the way, you can play this game for FREE, from your own home and earn up to 15,000lb a month by doing absolutely NOTHING!

  12. krellen says:

    So, while I’m relatively certain that any comment from me that has the word copyright in it gets flagged as spam, I’m willing to bet that now that I’m actually saying this openly, this comment will go through just fine.

    1. krellen says:

      Also, I believe anonymous flags, but it won’t flag now.

      1. krellen says:

        See, Shamus, the spam filter truly is an AI. It’s smart enough to mess with our heads.

        1. ET says:

          I don’t know what’s worse – the idea (from countless sci-fi works) that the AI will kill/enslave/exterminate us, or that it will use its vast intellect to be a giant internet troll. ^^;

          1. Kana says:

            I think if an AI came from the internet, it’d have bigger problems of 4chan, reddit, tvtropes and the likes clouding up it’s mind than exterminating the fleshy humans. Being an internet troll is about the best it could hope for…

            1. ET says:

              Quick – flood the internet with memes; It’s our only defense against the robuts!

          2. MadTinkerer says:

            Schrodinger’s Turing’s Test:

            People keep bringing up the idea that in the future there will be an AI that can effectively troll, without realizing that the only way to know for certain that it isn’t already happening is to manually travel to every troll’s house and make sure they are there.

            Memes are repeated ideas that survive based on how funny/entertaining they are judged to be, and re-posted accordingly. Spam is usually built from repeated phrases that survive based on how the bots tending the database notice which ones are the most successful at beating spam filters. If one were to create an AI that joins in the process of mutating memes and filtering it’s own continued attempts by what people seem to re-post the most… are you 100% sure that hasn’t happened already?

            Are we really, truly, 100% certain that “Anonymous” is not mostly AI already?

            1. Alexander The 1st says:

              We are Legion. We are Geth.

              Um…

              I mean…YOU DIDN’T SEE ANYTHING…

    2. Daemian Lucifer says:

      If you want to deliberately trigger it,just mention the original fallout*drink*.

      1. Daemian Lucifer says:

        And it worked.

        Here,a proof.

        So Shamoose,you now at least know that that is one of the key phrases.

        1. Decius says:

          I think that it’s “intentionally trigger it”, not the original fallout*drink*.

          1. syal says:

            So did “intentionally trigger it” intentionally trigger it?

            no.

            1. Thomas says:

              Okay so lets do the other part

              the original fallout*drink*

              Haha, Yep straight into moderation. This is brilliant

              1. ET says:

                LoL, I disbelieve you! Nothing like this would ever happen in the original fallout*drink* hah!

                EDIT:
                Hah! I win! It didn’t trigger on me! XD

              2. Volfram says:

                Is this like a new drinking game?

                drink every time someone says fallout*drink*?

                [edit] NO MODERATION FOR MEEEEEEEEEEEEEEEEEE!

              3. swenson says:

                Let’s see if the original fallout*drink*

                does anything.

                e: darn, I was very hopeful. :(

                1. Thomas says:

                  Maybe the no. of times it was coming up in this thread (and being accepted) detriggered the keyword temporarily?

                  EDIT: Gahh forgot to correct my email address. That’s one ugly gravatar

  13. Felblood says:

    In the years I have used this site, my flagged posts have contained links to Dwarf Fortress Forum threads, or Ad Homen attacks.

    So, maybe the system does work.

    1. Felblood says:

      Does anyone else remember back when every comment thread had DF zealots trying to get Shamus to invest the 200+ hours to learn the game?

      Example: http://www.shamusyoung.com/twentysidedtale/?p=1639

      Good times.

      1. Thearpox says:

        New release is coming soon, this time with adventure mode focus. So we might just get on that.

        And I personally find DF Forums to be even more golden than the game. Did you read the spam thread on the Forums, where people were discussing the various spambots? Because that’s what the comment below was inspired by, but I couldn’t find the link.

      2. Daemian Lucifer says:

        The only change is that its dark souls now.

        1. Eric says:

          Aw come on, Dark Souls only takes 50 hours to learn to play well, not 200.

  14. Thearpox says:

    I refuse to acknowledge your spammers as legit until you get at least several voodoo witches who who promise that they are going to help you improve your sex life.

    Also, how come no mention of the CHEAP DYI KICTHENS?
    Steam showers are lame. I am a devoted follower of the CHEAP DYI KITCHENS brand.

  15. Henson says:

    The WordPress robots saw your website background of dice and decided to use the same selection method.

    So, related story: I use Yahoo for email, and they’ve changed their spam systems within the last year. I get hardly any spam in my inbox (good), but I also can’t send email in any suspicious manner; for instance, if I send email from a new location, or if I send a lot of emails in a short time, or if I send emails to a lot of addresses at once. I don’t exactly know how it works, but if Yahoo finds something they think is suspicious, the message I’m writing doesn’t get sent, and I have to go through some authentication to be able to send mail again. This process can take up to a full day. It’s become infuriating that I can lose my ability to send email – my primary mode of communication – just because I hit ‘Reply All’ at the library. I’m battling fascist algorithms and I don’t even have a weapon.

    Also: hey, I got marked as spam! I’ve finally arrived! I’ll be sobbing in a corner if you need me.

    Edit: And my comment’s in moderation. This seems to happen to me a lot around here.

    1. Amstrad says:

      Hate to be ‘that guy’ but.. you really should be using something other than Yahoo mail. It’s cool they’re making strides to improve it, but it’s still leagues behind say Gmail.

      1. Andrew_C says:

        For various reasons I have email accounts with all the major webmail providers, and In my experience Google actually has the worst spam filters. Yahoo and Hotmail are better. Which is the reason I don’t use Gmail as my primary email.

        Although, Yahoo has gone downhill recently in the quality of their spam filtering, becoming almost as bad as GMail. And lets not mention their webmail interface in polite company.

        But that’s just my experience.

        1. ET says:

          Might be a problem with the spam happening in your particular region; I myself only get on spam via Gmail, in like…10^3 emails? 10^4?

      2. Henson says:

        I actually haven’t had too many problems with Yahoo…until this last year. It serves its purpose, it’s organized just fine. And I’m reluctant to change my email address for the first time in twelve years – especially since the only alternative I know about is Gmail, and I feel uncomfortable about Google having so much ubiquity in electronic communication. But there may come a point where this headache is simply too, too much. It will become a day of reckoning.

      3. Moridin says:

        It has unlimited space(not that the limits are low enough to be relevant in gmail either) and it actually allows you to organize the emails into folders, which is leagues ahead of the un-organized tagging system gmail uses, at least for my purposes.

        1. ET says:

          You can force Gmail’s labels to work as folders, but it’s cludgy. In fact, it might only be possible to make them work like folders, in the case where you’re making filters. I’ve never actually tried manually labeling stuff.

      4. Steve C says:

        Gmail used to be really good. Now not so much. I’m seeing a lot more false positives with gmail in situations where it doesn’t make any sense.

        For example I have an email account that forwards to my main email account. The first account will let a legitimate email go through as not-spam. But the 2nd account flags it as spam! That makes no sense. It’s the same filter running twice! Except the 2nd time it’s going from a trusted and authorized email address to a 2nd trusted and authorized email address. Spam filters really are random.

        I also discovered that there’s no way to turn off Gmail’s spam filters or even to create a whitelist. It boggles my mind.

        1. Ziusudra says:

          One of the options on Gmail’s filters is to “Never send it to Spam”.

          1. Steve C says:

            “It” being one message or some voodoo based on that one message. “It” doesn’t work. I need the option “Never send anything to Spam”

            1. Bryan says:

              Can’t you create one filter on “to:me” that’s “never send to spam”, and another “-to:me” that’s also “never send to spam”? That should cover everything.

  16. Bropocalypse says:

    Admittedly I have FAR fewer viewers on my site than you do, but I have a plugin that has a very simple, random-format math problem spambot checker, which commentors can circumvent altogether if they register. Admittedly, I do get some junk registrations on my site, but these bot-users mysteriously never leave any comments.

    Of course, I do get some spam in the filter anyway. They’re all of the same style that you have here. What’s the deal, anyway? I wonder if they’re bots at all, but some sort of underpaid Chinese spam-sweatshop workers. It helps explain how they can figure out HOW to comment, but have English so horrible that it basically means nothing.

    1. ET says:

      This kind of situation is what I was dealing with when I was selling my laptop on Kijiji. They show juuust enough intelligence so you know they’re humans, but the texts I was getting were poorly formatted enough that I know they’re not native English speakers. “I’ll pay you X + 300 for shipping it to my sister-in-law. What’s your paypal?” Yeah…like I’m dumb enough to wait for a cheque that’s never going to come. :P

  17. Destrustor says:

    I have a sudden urge to go buy bathroom and shower accessories.

  18. ehlijen says:

    Just my guesses on the false positives:

    First one (Henson) uses all caps and special symbols of almost an entirely line. Sometimes this indicates flame posts, but obviously not here.

    Second one(principal) contains both a well known author’s name and a book store name.

    Third one (ET) has got to be the videos along with unconventional punctuation. Nothing wrong with stream of consciousness sentences in my opinion, but they do tend to flag as incorrect English.

    But I have no idea how the others got through. Maybe institute a policy that automatically flags everyone’s first ~10 posts? More chores for you, obviously, and that’s the last thing any of us’d want for you.

    1. Daemian Lucifer says:

      OK,LETS TRY out taht theory ^&.

      EDIT:Nope.

      1. krellen says:

        If it flagged unconventional punctuation and formatting, every single post Daemian ever made would be flagged. :)

        1. ET says:

          Daemian, y u no put spaces after commas? XD

          1. ehlijen says:

            Then it appears that I am as clueless as the rest of y’all :D

          2. swenson says:

            I have wanted to know this ever since I started reading comments here.

            1. Daemian Lucifer says:

              Laziness at first,but it evolved into my style afterwards.Thats how youll know if I was overtaken by a spambot.

  19. Cybron says:

    The fact that they don’t have a way to enable commenters to ignore the spam filter after a given number of approved posts seems very silly.

    1. Thomas says:

      Especially since the threshold could be so low. Has the commenter three approved comments and no comments currently in moderation?

      If the spam filter is 90% successful, only 1 in 1000 spam bots get past that threshold. Up it to four approved comments and it’s 1 in 10 000.

      1. Bryan says:

        But how hard is it to impersonate some other user? You have to guess at the email address, and/or the name (…but no, not the name, as there are like five different Bryans leaving comments here, only three of which are me — two are older email addresses, so that’s not *quite* as bad as it sounds), but it’s not like there’s any kind of authentication of that address.

        And it’s not like we all want to give Shamus a pgp public key, or an ssh public key, or some such actually-cryptographically-secure option. (Not to mention that browsers can’t do that. And users generally hate it. In fact, forget I mentioned it altogether. :-P ) Or even a password.

        Dunno, but I think the idea of requiring some math has some possible merit. The question would be how you decide which math problem to give any given page view, and how to tie the POST data back to the original question you asked. It’s not like you want to provide the numbers — or worse, the answer you need to see — in a hidden input field, as that’s pretty spoofable too.

        (And of course this comment gets spamflagged. I, for one, welcome our new robot overlords… hooray random numbers!)

        1. syal says:

          There’s already a manual override. If a veteran loses their address to a spammer you just nuke them.

          …I don’t like the idea of math problems as filters; I’ve heard that computers are kind of good at those.

          1. Daemian Lucifer says:

            I propose higher math filters.Something like log(100),ln(e),7′,sqrt(-4),etc.

            1. Duneyrr says:

              But I can’t solve that. I’ll never be able to post again! D:

          2. ET says:

            English-language logic problems!

        2. Thomas says:

          If you use their IP address as their identifier that would probably be good enough, right? There’d be some exceptions but it would catch the vast majority

          1. Chargone says:

            Large parts of the world (including mine) use variable IPs as standard. Seriously, american systems which try to identify my location to ask if such and such is me run face first into the fact that Everyone in the Country using Telecom (was the main phone company, was a sub unit of the state post office. Has now been split into a bunch of sub bits… long story) as their ISP shows up as being in Auckland, because the fact that the ISP’s systems are there is the last bit of geography based IP data there is.

            So… yeah, IP addresses don’t help much unless you want to ban significant sections of what may well be entire small countries.

            1. ET says:

              Dynamic IP addresseshave been in my small city for over a decade. So, yeah, IP addresses are no good for identifying people.

              1. Thomas says:

                But we don’t need 100% efficiency here. People whose IP addresses aren’t stable just get thrown to the standards we had before, except a bunch of false positives have been removed.

                I guess you’d need to make it more careful about counting no. of accepted/rejected incase someones IP keeps crossing over with a spammer, but that isn’t too bad.

  20. Kana says:

    Speaking of wordpress being silly, I know it’s super late but I still want to apologize for spamming twentysided a couple times in the past. For whatever reason when I still had my blog, mentioning anyone else’s in any form of link would have wordpress go back and automatically comment on whoever’s blog I linked to. And then post a link back to my blog. God that was mortifying to find out, since I’d linked to more than a couple pages and people.

    I’m so sorry. ;~;

  21. Neko says:

    I was writing my own RSS reader a few months ago and ran afoul of your Bad Behavior plugin. After trying a dozen permutations of settings, I eventually found that if I spoofed the User-Agent and specified what encodings I was accepting, the plugin would graciously allow me to get the RSS feed so that I could put links to your posts along with all the other links to posts on blogs I follow.

    I’d been meaning to mail you about it, since the Bad Behavior rules seem completely arbitrary. However, since this was just a silly little script to aggregate links for my own personal use, I felt I hadn’t really put enough dilligence into figuring out what was wrong on my end and exactly what innocent behaviors were frowned upon by the plugin, so I figured I’d just stick with what works and put a more thorough investigation onto the backburner.

    As I did my Honours year in AI (Machine Learning), I’d offer to write you your very own neural-net enhanced filter, but ewww, PHP. I’m more of a Perl, Java and C++ guy. The one email classification system I did make was nice and accurate, but it was academic software designed to rigorously come up with accuracy metrics for different algorithms and not really usable as a day-to-day system.

  22. Daemian Lucifer says:

    Your comment is awaiting moderation.

    I wonder how many of you will get unnerved by this.

    1. Henson says:

      This comment is the epitome of evil. You bastard.

    2. Grudgeal says:

      After my first ten posts or so that got that and somehow still got responded to, I just guessed the site automatically body-checks anything sufficiently verbose and lets it through as soon as a human has taken a look at it. Nowadays I hardly even notice when it shows up.

  23. MadTinkerer says:

    “Even if I left every single comment stand, your spam would NEVER build up enough search engine credibility to end up anywhere NEAR the top of the search results.”

    *beep*

    turing test failure.

    please rephrase in the form of a SQL injection to the database the botnet uses.

    then visit our latest client’s CHEAP CANADIAN MEDZ 4 M@nh00d 1mpr00vm3nT site and leave feedback, in Ukrainian. our human coder is currently working on that site. Дякую

  24. Decius says:

    Can you tell which comments are real and which ones are spam if you rot13 them first? (Or, if you can read rot13, rot20?)

    Can you write rules that identify spam more specifically and more sensitively than your current filters?

  25. Shamus, maybe it’s time to use a login for the comments and then white list people that have made maybe 10 comments without pissing off people too much. (and you can always un-whitelist somebody later).

    Provided the login stuff can be done “in” the comment box/area somehow then I would personally not mind that.
    The “regulars” are usually the biggest posters, and while the post count may go down a little, the false positives should drop dramatically (and make moderating a little easier hopefully with less posts to check).

    1. Daemian Lucifer says:

      I thought of this as well,but after some thought I figured that its a bad idea.It has happened to me a few times that I forgot the correct “login” for myself when writing posts from different computers,and I dont change them that much.So I can imagine why this would be a hassle for people that often switch them.

      1. 4th Dimension says:

        Maybe dump the login part but automatically flag first ten posts of any username-email pair and don’t let any through until at least 75% get human approved.

        1. How many posts have you (or should I say “I” ?) made so far?

          1. Thomas says:

            If our email addresses aren’t easily visible, then even that method isn’t a concern, I mean it’s obvious both to us and the computer that you aren’t 4th D. The chances of a spammer correctly guessing the correct username and email address are pretty low.

            If not, there are other methods for keeping track of people which are more effective.

            1. Tom says:

              Just checking that it is our email addresses/username combination atm

              1. Thomas says:

                Yep. But there’s still other data even if our email addresses are findable somehow. I’ve never seen a spambot post as me or anyone else which is a good indicator that they don’t have that technology

                1. Or that Shamus and the three spam systems caught it. (I wonder if Shamus has any stats on that.)

                  Edit: Oops. I was supposed to post that as myself and not 4th Dimension, sorry about that, I hit the save button too soon and didn’t notice the pre-filled info in the form.)

                2. A lot of the regulars here probably have a email address that is not that hard to dig up.

                  Basing whitelisting on a login-less solution is just asking for trouble as spammers would then directly target long time posters and spam on even old posts on the site. If whitelisted then Shamus would be unaware of this occurring.

                  You might say that some spam checking should be done still on whitelisted ones, but what is the point of doing that, the purpose of whitelisting is that you do not have to do that.

                  1. syal says:

                    If Shamus were unaware of whitelisted posts we wouldn’t have this post telling us how spammers are getting through.

            2. Is this still as obvious?

              (Yeah, I figured out 4th Dimension’s gmail address.)

              1. Follow up to the example above. I managed to mine (by hand) a lot of info about 4th Dimension. I’ll email Shamus and 4th Dimension on the details.

                It was way too easy to dig the info (easier than I imagined) and no special tools was needed.
                Due to the way certain things work it may or may not be possible to mitigate it.

                1. Duneyrr says:

                  Yikes, that’s pretty intense!

      2. ET says:

        We wouldn’t need to remember a new login, if Shamus used OpenID! It’s the future of logins! :)

        1. There are a few things to keep in mind but yeah OpenID is one solution.

          Though Shamus could just as easily tie the forum login into the comment system.

  26. Erik says:

    Great, now i have the TF2 music stuck in my head. Thanks for the title, shamus!

  27. Nick-B says:

    Personally, I think shamus should just keep changing the check box logic. First, change the checkbox to “Confirm you ARE a spammer”. This may not help – now that I think about it – since it is probably holding at bay hundreds of comments per post.

    But next, add another. Make it so we have to check ONE but not the other. After a while, reverse them. Then after a while, make it so you have to check BOTH. Get creative. Try double negatives: “Confirm you aren’t NOT a spammer.”

    Edit: And… “Please check the box to confirm you are NOT a spammer” got me again.

    1. Daemian Lucifer says:

      That will just put it into DRM territory of punishing legit users while doing little to remove spam(granted,it will reduce spam,but will the reduction be more than the annoyance of legit posters?I doubt it).

    2. syal says:

      “There are six check boxes. Five always tell the truth, and one always lies. Check the Boxes Of Truth to post your comment.”

      And the correct answer is typing “cast Mass Death” into the captcha.

      1. Bloodsquirrel says:

        GIve every poster a screenshot of a 2nd edition D&D character sheet and make them figure out the character’s THACO.

        1. Bryan says:

          Well, that’s it, I’d be gone. :-P

          (Edit: …just in case it’s not obvious. Not because I dislike 2E or anything like that. Because I have no idea how to calculate that. :-) I suppose maybe I could look it up…)

      2. How about a captchathing that goes something like “Halt! Before you cross over the bridge of death, you must answer me these questions three!”

  28. Nick says:

    Not sure that this helps, but when we had an ungodly number of spammers on our phpbb forum, we added another field into the PHP of the signup sheet with just ‘Type student’, with an appropriate next to it.

    Then the script that the form submits to was edited to die if the field didn’t have the word student put into it.

    It seems that many spambots will autotick boxes in a form but they can’t fill out a field that just tells you to type a word.

    Obviously very hacky and might not be editable in wordpress, but yeah that’s how we got around it

  29. Grudgeal says:

    “All right I confess, I’m a spammer. This entire post is crammed full of links to irrelevant useless sales pitches. I’ve purposefully been trying to deceive WordPress’ spam filter. I’ve been a bloody fool.”

    “…I don’t believe you, sir.”

    “It’s true. I’m, erm, spamming.”

    “Don’t give me that, sir, you couldn’t spam an eight-year-old’s home-made livejournal, let alone bring advertising posts through this spam filter.”

    “…What do you mean?! I’ve spammed this blog before! I’ve spammed shower heads, work-from-home scams, luxury watches and male enhancement pills – you name it, I’ve spammed it.”

    “Now come along please, you’re wasting our time. Move along.”

    “Just look at this link! Look at it!”

    “Look, for all I know, sir, that’s some kind of new tinyURL linking to a genuinely insightful blog post.”

    “I wouldn’t make a tinyURL containing four hundred characters, half of whom are ampersands!”

    “People do. Now take your post and move along. Stop trying to waste our time; we’re out to catch the real spammers.”

  30. Mephane says:

    NO REASON WHATSOEVER

    Thanks, now I have the voice of Mr Torgue stuck in my head. XD

  31. Tektotherriggen says:

    I wonder if all the steam shower spammers target you because you’ve said how much you like Steam in the past?

  32. Tometzky says:

    You should add a question to your comment form, which would be simple to answer for a human and impossible to guess for a robot. For example: “What is the first name of this blog owner?” and reject any comment in which the answer is not mathing a PCRE regular expression “/shamus/i”.
    And when spammers adapt – just change this question.

    I think I can code a wordpress plugin for this for you if you wish.

    1. Felblood says:

      Didn’t we actually used to have something like that here?

      I seem to remember having to type “D20” every time I posted at one point.

  33. Neko says:

    One random idea that might be simple to implement:

    Include some form fields called Name, Email, Website, URL, Favourite Colour, whatever.
    Rename the existing fields to something nonsensical.
    Update the form submission code to use the nonsensical fields, and flag anything filling in the “normal” fields as being spam.
    Use some CSS to hide the honeypot fields and rename the nonsensical fields to what they should be.

    Now this probably raises a few accessibility issues, but it might cut down on a lot of the automated comment posts, because the bot will see the trapped fields and fill them in.

    Just a thought.

  34. Neil W says:

    This will get fixed one day.

    The next day Shamus will put up a Diecast in which Chris talks for twenty minutes about Steam Shower Simulator and Rutskarn has a roleplaying anecdote about real estate investment, following which Mumbles and Josh have a long and complex argument about sex toys. The moderation queue will be epic.

  35. General Karthos says:

    Here’s a guess why they mention those showers. (I don’t want to use the phrase, because I don’t want to be labeled as spam.) Steam showers are a real thing, if you look it up on google. But because you mention Steam from time-to-time on here, usually as an example of the right way to do electronic stores (as opposed to say… Origin *shudders*) the spambots see the word steam, and decide you must mean those showers. Those posts may have more comments than most, or the word may come up in comments more often.

    So they come on mentioning showers.

    Have I asked if you’ve ever considered the advantages of owning a really fine set of encyclopedias? They would let you right to information bedazzling with many fastidious facts.

    (That might have got me flagged as spam. Sorry for the extra work. I couldn’t resist.)

    1. MichaelGC says:

      I’ve previously carried out entirely unscientific testing which produced tenuous and completely circumstantial evidence that Shamus’ spam harvesting robots hate “Origin.”

      Edit: But this post got through fine! Maybe the harvesters have succumbed to the ME4 &/or DA3 hype…

  36. Someone says:

    Regarding the steam shower units:

    I wonder if this has anything to do with the fact that steam (from Valve) is mentioned here very frequently. By seeing a lot of approved messages with the steam keyword, the filter might get confused.

  37. I don’t know how well it would work on a large scale site as this one, but I use Sweet Captcha and it stops most of the spam (I only get like, one every month or so)

    Here’s the link: https://sweetcaptcha.com/

    1. ET says:

      Oh, man, that’s really cool! Shamus, get this – it’s adorable! :D

    2. fdgzd says:

      That’s … that’s the best antispam measure I’ve ever seen. By which I mean least obnoxious for users. I like it!

    3. Duneyrr says:

      This is great! :D

  38. postinternetsyndrome says:

    One forum I visit sometimes require you to do some simple math in order to post. Maybe that could work?

  39. Shamus, would it be possible to add a Gravatar checkmark box to the form where you comment?

    Reason I ask is due to stumbling across this:
    http://meta.stackexchange.com/questions/44717/is-gravatar-a-privacy-risk

    Does your website send my me ail (or aparently a hash of my email) to gravatar each time I write a post?

    I’d rather have the option to not to.
    Heck if you added a “Use Gravatar” checkbox that defaulted to checked then I’d happily uncheck it every time. (that would preserve current behavior but allow people to opt out).
    It might speed up making posts too?

    1. BTW! There is a way to somewhat anonymize Wavatars, by simply doing md5(email + sitename).
      Down side is that the Wavatars would be unique to this site only.

      I guess a new way to pass the hash to Gravatar could be devised.
      Maybe by passing md5(email + sitename and/or salt) and then pass the sitename, Gravatar could then look up the right site in the database and locate the gravatar hash for the user,
      and those with a Gravatar account could then tie each site specific gravatar hash to their account.
      This would reduce privacy leaking to a minimum. (you would only be able to track someone within the same site rather than across the web).

      I guess if you wanted to be extra clever you could do md5(email + sitename + salt)
      and register the site at Gravatar, that way the salt would become a sort of shared secret between the site and gravatar.

      EDIT:
      Right now my image url is http://0.gravatar.com/avatar/e6a6ea627a280ea46d6002d575414f1c?s=48&d=wavatar&r=PG

      With such a site spesific Wavatar it would be
      http://0.gravatar.com/avatar/9830575eea849093b4c7f55d3e3e9674?site=shamusyoung.com&s=48&d=wavatar&r=PG

      The hashing would be done like this:
      hash = md5([email protected]|shamusyoung.com);

      1. ET says:

        OK, after reading the Stack Exchange post, I too, would like either an option to not use the Gravatars, or to have it hashed. ^^;

  40. Bloodsquirrel says:

    Greetings! Thy webhold is a vista of pleasantry, and thy article a trove of wisdom! Related to your efforts is this electronic bazaar, from where your disciples may trade their lucre for devices that use the power of steam to cleanse their bodies, or shoes and pouches from famous artisans!

    1. That was way to intelligible to be real spam, but kudos for not including a url though, I sometime get email spam that advertises something and no urls to anything, makes you wonder if some spammers are just a Turing test gone wrong.

      1. ET says:

        The ones with no links are probably trying to build up a presence on websites as “trustworthy”, before they start the actual spam. You know, to defeat the new hotness in how we deal with spam. ^^;

  41. ehlijen says:

    What a about a pun based test?

    Give the users a Rutskarn quote from spoiler warning, ask ‘Is this a pun?’ and make them type either ‘yes’ or ‘no’ or ‘I hate you so much’. Posts with incorrect answers get flagged.

  42. Alec says:

    As a point of interest – my Dad does random manual page breaks.
    He came to PC use at his work later in life (late 60s) and his first word processing and computer experience was an Outlook client with no word-wrapping.
    Now he can’t break the habit of /n every time he feels like the line will run on.

    As a more depressing point of interest, at the start of the article I spent about 30-40 seconds trying to make actual sense of the steam-shower cubicle comment before realising it was an example of spam. I thought he was referring to something from a Spoiler Warning episode.
    So…score 1 for spammers?

    1. rofltehcat says:

      They definitely need more steam showers in Spoiler Warning. I don’t really know if there are actually any games with them but it’d surely be very interesting.

      Maybe some Hitman game might have them? Hide the damn spambot’s corpse in its own Steam shower!

      However, I still don’t understand why anyone would go to the extent of spamming to promote their brand/product. To be honest stuff like that just puts me off instead of raising interest. I can’t imagine things like a slightly better search rank actually doing anything for them. It is even weirder when you look at it on an international basis:
      Why would I ever buy “canadian pharmacy” products? My healthcare covers everything I could possibly buy from those, not even assuming the’d send you placebos at best and poison at worst.

    2. ET says:

      I used to do line-breaks after every sentence, since it reads easier to me, as a programmer. Out of courtesy to non-programmers, I changed. Probably nobody noticed. :P

      1. Richard says:

        I suppose it could be worse;
        Sometimes I accidentally end every phrase with a semicolon;

        1. Retsam says:

          I wonder if my programming had any influence on my propensity to use semi-colons in my typing or if I’ve just successfully trained myself to avoid the rampant comma-splicing practiced by most of the internet.

  43. Jeff says:

    “My comments would by 90% spam without them.”

    Trade you an “e” for a “y”?

  44. Friend of Dragons says:

    I wonder if these results are replicable…

    1. Friend of Dragons says:

      Chirs: “I love the sense that Skyrim has, the sense that it’s a real, living place…”
      Josh: * BASH BASH BASH *

      1. Friend of Dragons says:

        Apparently not.

      2. Henson says:

        Hey! You misquoted me! Plagiarist!

  45. Chris says:

    Spam filter = Alpha Complex Computer (at least I hope spam filter never grows into another system and start running things)

    1. ET says:

      Friend Computer would at least be more interesting than real life as-is… ^^;

  46. Chris says:

    Just realized that a game exists on this post to see how many can get moderated for spam. Maybe a large enough sample size and we can crack the random number.

  47. MichaelG says:

    I never had your readership, but there was almost no spam on my Disqus comments. Perhaps because it doesn’t generate traffic for the spammers?

    1. Richard says:

      We get loads on the forum we run for our customers.

      Thankfully we’re about to change the back-end, so hopefully the new one will be less brain-dead when it comes to spammers – hint, if the customer has never posted before and the post has got a link in it, it’s either spam or they are asking for help on integrating a specific 3rd party product.

      And no real user has ever actually included that necessary info in their first post when asking for help!

  48. Thomas says:

    Just fixing my default email address

    1. Daemian Lucifer says:

      That one looks so sad.The other one had a monocle.Why wouldnt you want to have a monocle?

  49. Domochevsky says:

    Hm, Shamus, may I suggest replacing all three with just Antispam Bee? It has done pretty well for me so far (together with “new commenters must be approved first”). Maybe it’ll do so for you as well. :)
    (And it’s not like enabling/disabling plugins takes long with WordPress anyway.)

    The results you’re getting here are just disgraceful. >_>

    1. Domochevsky says:

      Oh yeah, the plugin description is in german. Google/Bing translate will likely be required there, but the plugin itself is fully in english.

  50. RCN says:

    Spammers always adapt. So, are they borg, replicators or cylons?

    Anyways, can’t YOU write it yourself some rudimentary code to make a third (well, fourth) pass on these comments? I am almost certain it’d be trivial to at least write a code that’d block every “A bunch of great guidance on this great site. need a steam shower unit in my bathroom”.

    Though it’d be less trivial to write a tracer that’d find the source of these posts, find their personal e-mail, and flood it with EXABYTES OF PRON.

  51. kdansky says:

    If you haven’t done it yet, add a tiny checkbox (“Do not confirm this”) next to the Spammer-Confirm, and throw out any comment that marks it. We normal users won’t click it (you could also hide it behind something else), and bots will probably either mark both boxes or none.

  52. Ninjariffic says:

    I’ve been reading your blog and lurking in the comments for years. I’ve only rarely made a comment myself, but I think I’ve seen the “awaiting moderation” message every time I did. I’ve always wondered what the criteria was (or I guess what it’s supposed to be).

    Also, do you know where I can get a steam shower?

  53. 4th Dimension says:

    test to see gravatar privacy

    1. 4th Dimension says:

      The previous gravatar account is hidden since the username is a hash, this one should be accesible.

      1. 4th Dimension says:

        Confirmed. In order to make your profile unavailable for browsing no matter what you need to use a guid or a hash as your username when you sign up for gravatar. You can use your email to login but using hash disables redirect to your profile since it seems that they decode your md5 hash to get your username.

        1. Not 4thDimension says:

          Let’s see if gravatar uses only email to check for avatar.

          1. 4thDimension says:

            Final test.
            I would like to apologise to Shamus for using his blog for this testing.

  54. Trena says:

    Never ever heard of a steam shower enclosure until I discovered this
    site, so happy I did would like one now and money allowing will be owning one pretty soon

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to Neil W Cancel reply

Your email address will not be published.