Wednesday’s usual Nan O’ War episode will appear later this week. I was going to post about the latest talk from John Carmack, but I feel like that kind of post needs to simmer for a few days. So rather than leave this spot blank, I thought we might look at the work of a new spammer to the site.
All of the messages in this post arrived from the same IP address, and all of them within a few minutes of each other. All of them bypassed the various spam filters and appeared on the site where the public could see them. (I manually took them down once I spotted them, obviously.) They managed to properly handle the “Check here if you’re not a spammer” checkbox. They managed to spoof Akismet, which is my main software-based defense against spam. They also successfully got by the common stuff like keyword filters. One or two of them almost got past the ultimate filter, which is my human brain. That’s a pretty good night’s work for a spam bot. (Or perhaps a shameful night’s work for my spam filters.)
Let’s meet our first contestant…
javascript obfuscator
Very funny. Keep up the good work!
The text of this comment is totally legit. It’s not word salad. It doesn’t have screwball formatting or extraneous non-English characters. It’s actually IN English. It’s even been left on a funny post, so it checks out. The giveaway here is the name. Now, on any other site the name “javascript obfuscator” would immediately get you busted as a spam bot. But around here it could plausibly be some programmer’s self-deprecating handle. Something like:
Q: So what’s a javascript obfuscator?
A: Anyone who writes javascript for a living, because JS is self-obfuscating.
But the giveaway here was that the name “javascript obfuscator” was also in their domain name. Still, it was a nice try.
So how did this bot manage to post a coherent comment in the proper context? They cheated and copied off of someone else’s work. That same comment had already appeared on the post.
192.168 ll
The ending of Mass Effect 3 is where the problems culminated, not where they began
The bot tried again just three minutes later. Again, they managed to post a coherent thought. The mistake here was that they quoted the post and not another commenter. Even that might have slipped by if I was being inattentive, but then they used a gibberish name that drew attention to itself. 192.168 is the first part of the default IP of a home router, and it’s just screaming out “someone has configured their spambot incorrectly”.
spanish to english
Very funny indeed
Two minutes later. That pattern here is pretty obvious by this point. Again, this comment is simply quoting someone else. It would have skated right through, except “spanish to english” makes no sense as a username and it matches their URL. If they were named “The Translator” – and if it didn’t come in this rapid-fire bundle of spam – I wouldn’t have given it a second look. “Oh. This guy translates stuff for a living. Good for him.”
bullet force
Can't wait for the next entry!
Same mistakes. The name is odd enough to grab my attention, where I notice that it matches the URLDon’t bother looking it up. It’s a garbage multiplayer shooter for mobiles.. Also, the ripped-off comment doesn’t make sense this time. “I can’t wait for the next entry” made sense four months ago, but that is no longer a plausible response to that post because the next entry has already been posted. 17 of them, in fact.
Last one:
json formatter
It was ridiculous. Even without being able to read the code on the slides, you could tell the steps varied widely in operation count, were often split up and in different order, and just looked different.
Same thing again: Nonsense name that matches the URL posted too soon after other messages with the same M.O. Also, this one repeated the mistake of quoting the post rather than a comment. That’s actually a spam bot quoting me quoting John Carmack.
I’ve never seen a spambot behave quite like this one before. If I look in my spam filter nearly everything falls into one of these categories:
- Word salad gibberish.
- Giant walls of meandering text unrelated to my site or the spammer’s URL.
- Non-English.
- Just a big list of URLs.
So this one was kind of refreshing. Hopefully Askismet catches up soon so I don’t have to sort through too many of these by hand.
Footnotes:
[1] Don’t bother looking it up. It’s a garbage multiplayer shooter for mobiles.
The Biggest Game Ever
How did this niche racing game make a gameworld so massive, and why is that a big deal?
This is Why We Can’t Have Short Criticism
Here's how this site grew from short essays to novel-length quasi-analytical retrospectives.
id Software Coding Style
When the source code for Doom 3 was released, we got a look at some of the style conventions used by the developers. Here I analyze this style and explain what it all means.
MMO Population Problems
Computers keep getting more powerful. So why do the population caps for massively multiplayer games stay about the same?
Starcraft 2: Rush Analysis
I write a program to simulate different strategies in Starcraft 2, to see how they compare.
It’s actually a fairly old technique (copying other posts to appear legit). Really swarmed a few sites 2-3 years ago. Interesting that they’re only just finding here (or more likely, only just figuring out how to get past the anti-spam here)
I’ve had a number of spam comments — caught by the spam filter, though — that copied parts of the post it was replying to. So it looks on topic and like an actual comment until you look through and realize that it’s what I said in the post and ISN’T someone quoting it.
They just need to add in some quote tags, or quotation marks, and then they’re golden.
I’m guessing the clever bit that got this bot through the filters is how short the phrases they copied are. Spam filters probably need tuned so that they won’t throw thing small, commonly used phrases (though that might also be desired behavior against a “FIRST POST” epidemic) otherwise they’ll throw out two people quoting the same meme or something. If my theory is correct, the filter bot is looking for post greater than X characters that are identical to other content.
What they do is good enough for most blogs/bloggers. In my case, I get so few comments that I’m almost always going to read any legitimate one in detail, and so even sticking the quote on won’t help.
At least the first time, though, it gave my a kind of surreal “Hey, that’s an insightful and well-reasoned comment … well, it should be, it’s what _I_ wrote!” moment.
Throw in a ‘lol’ at the end. Indistinguishable from human.
lol
lol
Wouldn’t this just result in a spam loop of “lol”s, though? Sooner or later Akismet or some other anti-spam filter would have to block such posts as well…
I dont see a problem with that.
Clearly this is where we are ultimately headed: https://xkcd.com/810/
Wouldn’t that just end up with clogging the system up with a large number of spam users whose whole job is to upvote other spam messages as “constructive”? The easiest way to do so (in keeping with the spammer community spirit!), would probably be to just upvote EVERY comment as “constructive”, ultimately marginalizing and breaking the system.
And if the system tries to counter by increasing the number of “constructive” ratings required to a certain minimum threshold, the spam group could pull the plug by having some sort of a back-end database solution wherein it keeps a list of posts that are spam, and only upvotes those posts. While it wouldn’t help spam get through the system (especially if there were several spam groups, which would cut themselves off into only upvoting their own spam), it would definitely stop any legitimate comments from going through, as those certainly wouldn’t have a base of support from the sham spam accounts.
And if the system tries to counter THAT by banning spam upvoters, they could randomize their patterns of working by only upvoting some spam while also upvoting some (but perhaps a smaller number) of legitimate comments. This would then require a new system to recognize through spam upvoters, which may or may not be a more difficult system than just recognizing spam…
This problem just seems like a never-ending battle more than one with a definitive solution.
You had the opportunity for the perfect joke,but now its too late.
Very funny. Keep up the good work!
That’s dedication to a gag. Bravo!
Dear god, they are EVOLVING!
…
Why is it always the crappy movies that I remember well enough to reference?
Shamus wrote:
Shamus wrote:
json formatter
It was ridiculous. Even without being able to read the code on the slides, you could tell the steps varied widely in operation count, were often split up and in different order, and just looked different.
We are the spam.Turn off your filters and surrender your blogs.We will add your text and your valuable content to our own.Your audience will adapt to service us.Resistance is futile.
All your blog are belong to us.
Shamus – “[Insert Moby-Dick quote here]”
This was the most entertaining of the “meet the spambots” posts :>
(Feel free to quote me on that, whether you are human or a spambot :> )
Shamus,
If the spammers are commenting on old posts, maybe you could just auto-lock comments on posts older than X weeks? I was also going to suggest moving the comments somewhere that requires a login, so a large company can deal with the spam, because surely they can deal with spam effectively! Then I spent 5 minutes reading about the plague of spam on Reddit…
Not gonna work here because some people do leave comments on those old posts because they are new.Some of those insightful even.Heck,even Shamus has responded to a couple of posts recently that were made on posts that are years old.
Forgive my ignorance, but what is the purpose of these spambots simply copying and pasting text from other posts? I guess I’m having trouble understanding the endgame. Does it give the programmers access to something on Shamus’ site? Since it seems like a great deal of work to bypass the various filters on this site (and others), what do they have to gain by doing this? For instance, I will often see in some comment threads the (obligatory as a guitar in the background of a scene in a movie) post claiming that “my x made thousands of dollars by doing y.” Do people actually click on those links?
If I understand their methods correctly,having their links floating around improves their seo.Basically,these are bots designed to trick other bots into displaying those links higher when people look stuff up on google.
Go check out Jim Sterling’s video on gambling websites. Basically, they don’t care if anyone clicks because just having the link out there apparently improves their Google Search ranking.
Gotcha. Thank you for clearing that up for me.
Now, I’m not a programmer so forgive me if this is naive – if they figured out how to check the “Check this if you aren’t a spammer” box, could you theoretically make two boxes, one that says “check this if you aren’t a spammer” and one “check this if you *are* a spammer” and thus fool them until they recalibrate the spambot?
Honestly, I wonder WHY they got a spambot to check the box. Your site is the only site I’ve seen this simple feature on, and I don’t think a spam-bot-maker would teach their bots to circumvent something not widely used.
I remember in a previous post it talked about doing something like that; possibly even ‘hiding’ the second so we won’t see it but the bots will. Check boxes are becoming common enough that a site I follow uses a two-position slider.
It’s a technological arms race, with the smaller sites (like here) seeking protection from the major powers (Facebook/Twitter/etc integration).
“Your site is the only site I've seen this simple feature on”
I used a variation of that a decade ago on a forum I used to run–actually, though, it was the opposite: a checkbox created by mildly obfuscated Javascript that, IIRC, was automatically checked. Spambots of the day didn’t run JS so they wouldn’t check it, and the server would do everything the same with posts that didn’t have the box checked except for actually write the post to the DB, so the bot would theoretically think its post was actually posted.
I’ve seen a few other sites lately do a variation: “what color is an orange?” and a box for you to type “orange”.
“I’m colorblind, you monster.”
…I always fail those boxes.
“Well, actually an orange doesn’t inherently have a color, it’s only the light reflecting off the orange that causes it to appear a particular way to us… “
Also,how do we distinguish orange from yellow?
Ah, category error!
How long before we go back to the game code sheets of yore?
“What was the third word of the fifth paragraph in the post that Shamus made on december the thirteenth of two thousand and eleven?”
Sadly, that post only had two paragraphs, so I guess we’re stuck.
Quick! To BlogFAQs!
This comment was saved from the spamfile. It was in the deep spam, where I never see it unless I know to look for it. I have no idea why, given Philadelphus has left 332 previously-accepted comments and there’s nothing obviously wrong with this one.
And yet I managed to make a Spammer S. Spammaron comment without a hitch.
Come to think of it…I vaguely remember forgetting to check the checkbox on a comment recently, getting a notice that I hadn’t done so, then checking the box and reposting it. It might’ve been this comment, which would explain why the spamfilter thought I was a sneaky spambot in disguise.
…Wow, I think my previous currently-in-limbo comment is the first comment I’ve ever had the spamfilter eat. What an astonishingly appropriate post for it to happen on!
Well, the humorous (?) part of sites that do this particular check, there’s only one question.
While we’re talking about the “I’m not a spammer” checkbox, would it be possible to fix the tabindex on the checkbox so that we can tab to it properly?
I’m the sort that generally uses the keyboard as much as possible, and the fact that I can’t just hit “[Tab][Tab]Space” to check the checkbox (and then “[Shift-Tab]Enter” to submit) is a minor annoyance.
It looks like you’d just need to set ‘tabindex=”8″‘ on the checkbox input tag.
I believe there actually is an “I am a bot” box, which is invisible but still selectable by bots who aren’t viewing the page the same way web browsers do.
No, there isn’t. Looking at the script, it just adds a single checkbox dynamically. Bots that aren’t running JS won’t “see” the checkbox, so they won’t know to check it and so their comment won’t be accepted.
There is supposed to be a hidden field called “email”, which will get you flagged as a robot if you put anything in there.
I wonder if “autocomplete” plugins/features might do that by accident, thus flagging a human as a spambot.
Almost certainly not. The hidden input is a special type of input, that’s specifically exists for input that the user is not supposed to modify. (It’s not an invisible textbox like you might expect from “hidden input”)
There’s no sane reason for an autocomplete feature to attempt to modify hidden inputs, because the whole point is that they’re not supposed to be modified by users.
Ah, yeah, there is an <input type=”hidden” name=”gasp_email”> in there. It’s not really a “hidden checkbox” but I guess it’s not a terrible way to think of it. (In reality “hidden input” is a normal mechanism that websites use to send along extra data with a request, that isn’t entered by a human)
I’m coming to appreciate how clever and multi-layered this anti-spam is. If a bot is just scraping the raw HTML and parsing it to make raw HTTP POST requests (as I suspect many do), they’ll see the bogus email field, assume it’s supposed to be filled out (e.g. by JS). If they’re actually running some sort of headless browser without JS, they won’t see the checkbox that they need to check. They’d have to be running a full headless browser with JS enabled to actually see the checkbox and check it.
… or alternatively, they might just be programmed to be aware that the GASP anti-spam plugin exists, and recognize that a “gasp_email” field is bogus. I wouldn’t be surprised if more sophisticated bots are specifically programmed with countermeasures for the more common anti-spam plugins.
That’s certainly the second thing I would do if I were required by my job to program a spam bot.
But since the first thing I would do is quit that job, I’m guessing that’s kind of moot.
As it stands, these are technically the only posts I have time to read and comment on, because they’re not analytical. I’m trying to curb article reading while I finish reading some philosophy books and start a new job.
Other then that, I perfectly understand this master spammer. Clearly, he has spent years preparing for this moment to spam you and use your own words against you.
FWIW, I frequent a site where a commenter named “Serbian to Vietnamese to French and back” routinely responds to troll/idiotic comments by running them through Google Translate repeatedly (guess which languages he uses) and posting the results.
Also, to save somebody the trouble:
I broke the site where the commentator called “Serbia in Vietnam in France and in the back” habit corresponding to troll / stupid comment for them to go through Google Translate occasions (guess which language it uses) and the game results.
I read the first paragraph, then read the second and thought “What gibberish!”. It actually took me a minute. lol
I’m actually kind of surprised that nonsense didn’t kick the comment into moderation.
I think simple but custom “I’m not a robot” tests are much more effective than anything you can just download and install. The problem with your “check this if you are not a robot” test is that a random input is 50% likely to guess right. A correct answer to a task like “write first letters of all the words of the following statement: I AM NOT A ROBOT” is much less likely to be filled right by a robot not specifically written to spam you.