Spam: Resourceful Idiots

By Shamus Posted Sunday Sep 29, 2013

Filed under: Rants 122 comments

I never cease to be amazed at the grotesque ineptitude of spammers. The battle between spammers and filters is over twenty years old now. So much ingenuity, creativity, and knowledge has been brought to bear against the problem of bulk unsolicited bullshit. And at least as much ingenuity, creativity, and knowledge has been used to overcome those solutions.

We make a wall, the spammers climb over it. We make it taller, they break through it. We make it stronger, they go under it.

And then once inside they have no idea in the world what to do. None. It’s like this messed up version of Sam Fisher where he breaks through security, sneaks past the guards, breaks into the control room, and then craps his pants and accidentally kills himself with an office stapler.

Last year I installed Growmap Anti-Spam Plugin (that’s the checkbox you gotta check to leave a comment) and for the next ten months I basically stopped getting spam. But now I’m getting a couple of these a day.

From visitor “Mdfinstruments gmbh”:

Ahaa, its pleasant conversation about this piece of writing at this place at this web site, I
have read all that, so now me also commenting here.

Unbelievable. You know, if you just typed something innocuous like, “Great post. Thanks so much for writing this” I probably wouldn’t even give it a second look. And Google translate is really good these days. There’s just no excuse for going to all the effort of circumventing Akismet, my word filter, and Growmap so you can leave a flagrantly obvious spam.

(Icing on the cake? The given URL for the user was some harmless wiki at miami.edu. Did somebody mis-configure their spam bot? What is even happening here?)

Another one:

Hello. I'm not used to this blog. I simply wished to sà¡y hi
à¡nd introducé myself. I am very excited to be a section of this community.

Whà¡t? Why would you do that? Why would you add random accents? No translator would mangle text like this. And if you can construct English sentences this well then you know enough to realize how wrong this is. There is no level of knowledge where you will be smart enough to say this but stupid enough to say it this way. That’s like a basketball player who’s tall enough to dunk but too short to reach the top of the ball. It’s just not possible in this universe.

They’re succeeding at the hard stuff and failing at the easy stuff. If I woke up tomorrow and decided I wanted to be a spammer, I don’t know how I’d overcome all the filters, IP blocking, blacklisting, and other systems. It might take weeks or months to learn how to do all of that. But once I did? I’m sure I could create spam messages that aren’t this ridiculously obvious, even if I didn’t speak the language. Heck, just re-posting OTHER comments or selections of the blog post would be a lot more effective than posting these idiotic word salad messages.

Yes, spam is a serious problem, but I’ve learned to tolerate it. What I can’t tolerate is bad engineering.

Shamus Young is a programmer, an author, and nearly a composer. He works on this site full time. If you'd like to support him, you can do so via Patreon or PayPal.

 Linux vs. Linux UsersPrevious Post

Next PostHow to Forum 

From The Archives:

122 comments

122 thoughts on “Spam: Resourceful Idiots”

Syal says:

Sunday Sep 29, 2013 at 11:10 am

Haha, great article, I agree completely. By the way, check out my GAMING CHANNEL at http:thsdsntexst/nosrsly.root

(I like how the second spammer is planning on being an entire SECTION of the community. Like, he’s going to find something no one talks about and singlehandedly make it a popular topic.)

Reply
Ilseroth says:

Sunday Sep 29, 2013 at 11:10 am

Well now just change it so it says “Confirm you ARE a spammer” and you’ll catch them for another few months.

That being said, it amazing how far people take technology, good or bad, and not know what to do with it. It kinda reminds me of developers that work on graphics, build an impressive engine, then make a game that it technically impressive but artistically bankrupt.

Though that at least requires an artist, spammers really just would have to do 5 minutes of research (if that) into a proper statement to copy paste. Considering the challenge the spammers must go through, seems like a fairly insane oversight.

Looking forward to the next Good Robot post, really fuels my urge to build a game, shame it has been years since programming class and we never really covered graphics (mostly just the basics of C++ and Java), if you have any tips or websites on coding graphics I’d be quite interested in hearing it :)

Reply
1. Daemian Lucifer says:
  
  Sunday Sep 29, 2013 at 12:34 pm
  
  “Well now just change it so it says “Confirm you ARE a spammer” and you'll catch them for another few months.”
  
  You know,that might work.Put two checkboxes,one saying “Confirm you ARE a spammer”,the other “Confirm you are NOT a spammer”.I doubt many spambots would tick the second,but not the first.
  
  Reply
  1. ehlijen says:
    
    Sunday Sep 29, 2013 at 4:07 pm
    
    Is there a way to make the checkboxes appear in a random order? Statistically it should cut down on half the spam, even if they learn to click only one box.
    
    Reply
  2. Jnosh says:
    
    Sunday Sep 29, 2013 at 4:40 pm
    
    You could even make the “Confirm you ARE a spammer” invisible to the user but by still being visible to the spammer in the code trick him while avoiding making things any more complicated for your users…
    
    Reply
  3. WillRiker says:
    
    Sunday Sep 29, 2013 at 5:11 pm
    
    This is actually a common thing for dealing with spam. It’s called a “honeypot.” Basically you create a form field and then use CSS to hide it from regular users; then you check if there’s any input in that field and block/spam filter it if there is. This works because spam scripts will often auto-fill every field in a form.
    
    Reply
2. kdansky says:
  
  Monday Sep 30, 2013 at 2:37 pm
  
  For the graphics metaphor: It’s more like those engines that have unbelievably pretty soft self-shadows, light shafts, superb animation blending, highest resolution textures, but are fixed at 30 frames per second and have no anti-aliasing what-so-ever, making everything a jagged and stuttery mess. Artistic value and engine programming are done by different people. It’s entirely possible that one guy sucked at his job.
  
  Carmack talked about this issue at length, and I see it a lot in current games. See GTA 5 for an example: 30 Hz, 720p, no AA, and superbly mediocre art direction. But the console crowd touts it as the prettiest game ever! Dark Souls on the PC looks far better, and that’s not a technological masterpiece by any stretch of the imagination.
  
  Reply
steve_h says:

Sunday Sep 29, 2013 at 11:15 am

I’ve at times wondered if there’s some kind of social engineering experiment going on with this kind of spam, flood the web with madness and see what results. I’ll get spam-like replies on Twitter that are link-free, and don’t even have a link on the spambot’s profile page. Or I’ve had spam on my blog that had a malformed link, wouldn’t even register in a search index. The ineptitude is pathological with these guys.

Reply
1. some random dood says:
  
  Sunday Sep 29, 2013 at 11:52 am
  
  @steve_h – “I've at times wondered if there's some kind of social engineering experiment going on with this kind of spam, flood the web with madness and see what results.” Erm, isn’t that the internet in general? Most comment sections for sooo many places are simply toxic.
  Also, I think the comments about the message contents are slightly off. Due to some news reports about various excesses from the web (bullying etc) hitting news media, I made the mistake of taking a look at what was being referred to. English did not appear to be the first language of the posters – SMS-speak was. Whether there was any bullying going on at that site or not, I couldn’t tell. It was illegible to me.
  So as much as I hate to admit it, those messages are not actually so far off comments made by supposed English-speakers (based on a very small sample of sites noted in UK news).
  . Guess I’m just too old for this sh!t…
  
  Reply
2. James Schend says:
  
  Sunday Sep 29, 2013 at 12:48 pm
  
  An explanation I heard once about these types of spam are that they’re about “poisoning” Bayesian anti-spam filters. The general idea being that if you write a spam that contains a bunch of popular words, the user will add it to their Bayesian spam filter. If you do this enough, eventually the Bayesian spam filter will be so full of common words that it’s useless at actually identifying spam (everything’s a false-positive) and users will turn it off.
  
  I’d be extremely surprised if there’s a single case of this concept actually working in reality anywhere ever.
  
  Reply
  1. rofltehcat says:
    
    Sunday Sep 29, 2013 at 3:36 pm
    
    Wow, who’d engineer such a plan? That is pretty much madness O.o
    
    I also think an experiment might be related. Especially with the one leading to the miami.edu thing.
    It could be just a research project about figuring out ways to circumvent spam protection (probably to improve it) but not about using successful circumvention for something malevolent. The strange accents and strange sentence structure could be used to track the number of successes.
    
    Reply
    1. Neruz says:
      
      Sunday Sep 29, 2013 at 11:35 pm
      
      It is important to remember that a lot of spambots are automated learning scripts that operate completely independantly of any human oversight. The reason a lot of spam looks like no human being was ever involved in its creation is because no human being was ever involved in its creation.
      
      Reply
3. Andrew Stiltman says:
  
  Sunday Sep 29, 2013 at 11:41 pm
  
  I sometimes get spam where the spammer manages to get through my defenses, but forgets to run the script to select words, so the message looks like this:
  
  I am sure this {article|post|piece of writing|paragraph} has touched
  all the internet {users|people|viewers|visitors}, its really really {nice|pleasant|good|fastidious} {article|post|piece of writing|paragraph} on building up new
  {blog|weblog|webpage|website|web site}.
  
  And then, a few days ago, I got a piece of spam like this – except it was nearly 5000 words long. You can have a look at it here.
  
  Reply
  1. EÃ¤rlindor (The Specktre) says:
    
    Monday Sep 30, 2013 at 6:10 pm
    
    I got something like that of similar length the other day, but it was a list of places and times or something. It was nuts.
    
    Reply
  2. Jennifer Snow says:
    
    Tuesday Oct 1, 2013 at 12:48 pm
    
    Wow, that is . . . impressive.
    
    Reply
4. Timelady says:
  
  Thursday Oct 3, 2013 at 10:55 pm
  
  Hm, on the Loading Ready Run forum, at least, there’s a real rash of spammers that leave semi- (or not) normal looking comments with no visible links anywhere, and then like a week later go back and edit their posts to be chock-full of nastiness. Could it be something like that, do you think? (And yeah. Spammers with broken links. It’s actually kind of funny to watch when the html or bb-code gets broken completely, too.)
  
  Reply
Chuck Henebry says:

Sunday Sep 29, 2013 at 11:24 am

Have you stopped to consider that only only noticed these spambots because they so patently failed the Turning test? Perhaps there are others posting on your site that managed to insert innocuous comments with good grammar and spelling.

How might you detect those guys? I guess you’d need to do a search of the trackbacks and website links for p0rn and off-brand sunglasses.

Reply
1. methermeneus says:
  
  Sunday Sep 29, 2013 at 11:49 am
  
  I think that if spammers are so close to the real thing that we don’t even notice them… Well, either that’s an inverse of Poe’s Law somehow, or they’re not really spamming anymore.
  
  Frankly, I did see someone who left an insightful comment on my blog once, along with a popup-type porn site link. I just left the comment alone and replied with something like, “If you’re not a spambot, please refrain from posting off-topic links.”
  
  Reply
  1. rofltehcat says:
    
    Sunday Sep 29, 2013 at 3:39 pm
    
    AI spammers that would post in forums in constructive (or at least non-offensive) manner would be great.
    They could be used to saturate the internet with happiness, scaring off the vile troll filth.
    
    Reply
    1. MrGuy says:
      
      Sunday Sep 29, 2013 at 3:58 pm
      
      There’s actually a fairly large industry around this. Not with AI usually – with actual people. The idea being to build up “real” personalities by posting casually in forums, and then leverage those personalities to recommend products/services, or to write positive reviews for hire (or negative reviews for blackmail).
      
      Actual recent news article. Also, relevant Penny Arcade.
      
      Reply
      1. Thomas says:
        
        Sunday Sep 29, 2013 at 7:05 pm
        
        I was totally thinking of doing this as a business when I was 13, until I realised it was a) evil and b) a really boring idea for a job
        
        Reply
        
        Michael says:
        
        Monday Sep 30, 2013 at 1:36 am
        
        Well, now I feel old. :\
        
        Wasn’t that also an XKCD strip awhile back?
        
        Reply
        
        Nathon says:
        
        Monday Sep 30, 2013 at 12:30 pm
        
        You’re thinking of this: http://xkcd.com/810/
        
        Reply
    2. Jennifer Snow says:
      
      Tuesday Oct 1, 2013 at 12:49 pm
      
      like this:
      
      http://xkcd.com/810/
      
      Reply
Ryan says:

Sunday Sep 29, 2013 at 11:31 am

I developed some spam-catchers for a few front-facing websites used by the Army that needed form submission. There are several techniques that worked well, and others that did not. Here are the three that I had the most success with:

– Checking for previous page loads and/or a time-out between submissions. (You mentioned having done at least one of those via a plugin.)
– Generate a very simple captcha, then ask for a single piece of information unrelated to the actual captcha image translation. (“Did the captcha image load? Type ‘Yes’ or ‘No’.” or “Please enter 42 in the box”)
– Generate a check box with associated text that says you are agreeing to some terms, add that box and its text to a standalone DIV and use JS to change the DIV’s visibility. Discard the entry if the box gets checked. (Most SPAM bots check for an “I agree to X” box but don’t check for post-load changes to the DOM.)

The first is the least obtrusive to users, while the second is the most.

The last is the one I’ve had more success with than I ever suspected. I had a form that generates emails on the back end, and when I removed that for one day (part of a code roll-back for other reasons) the half-dozen emails that the form sends to were buried in around 10,000 SPAM messages.

Reply
1. methermeneus says:
  
  Sunday Sep 29, 2013 at 11:55 am
  
  My god, I love that third one. <div style=”display: none”> is your friend, apparently. Definitely stealing this for myself.
  
  Reply
Nick-B says:

Sunday Sep 29, 2013 at 11:36 am

All spam HAS to be a plan to monetize (unless some bored guy is running a large trolling operation), so any time you get something like this and don’t see any link, be suspicious.

One thing to think of, perhaps this is serving as a harmless frontline canary. If a spammer sends out all these bots to sites to try to infiltrate their comment section, the fastest way to find out if he got in is to do a verbatim search for his “harmless” post that got through, then he knows that sites filter has been compromised. Expect not-so-harmless comments with links to come soon, as he uses google to search for his “canaries”.

Should we play a game with him? :D

Reply
1. Humanoid says:
  
  Sunday Sep 29, 2013 at 12:54 pm
  
  Maybe the payload was meant to be in a signature or other user-account fields (website field, or social media stuff)? I had a phase where I was seeing a lot of ‘innocuous’ posts from obvious spammer names that would have fit that pattern.
  
  I think some forum software is also configured to strip out hyperlinks depending on certain conditions, such as user rank or post count.
  
  Reply
2. The Rocketeer says:
  
  Sunday Sep 29, 2013 at 2:02 pm
  
  “Unless some bored guy is running a large trolling operation.”
  
  On the Internet? UNTHINKABLE.
  
  Reply
3. Roger HÃ¥gensen says:
  
  Sunday Sep 29, 2013 at 3:58 pm
  
  I think you are pretty correct in your assumptions.
  These are most likely automated probes; part of a datamining net that is cast out on various sites.
  Sites which end up with xx% success rate gets sold, and the higher the success rate the more they are worth too.
  
  Reply
4. MIchaelGC says:
  
  Tuesday Oct 1, 2013 at 7:05 am
  
  Good thinking – that would also explain the text enmanglement. If they did just write “Great post. Thanks so much for writing this,” they’d presumably be at risk of false positives when searching for canaries.
  
  Reply
gresman says:

Sunday Sep 29, 2013 at 12:10 pm

When I read the first message I was reminded of an email we got to our support department at the company, where I work.
After deciphering the written text and some investigation we were able to verify that the user was legit.
This just goes to confirming that not all horribly mangled english text messages are written by some spambots.

After rereading my comment I notice that I am a bit off today due to my english not being as good as usual. But that is besides the point. :)

Reply
Weimer says:

Sunday Sep 29, 2013 at 12:16 pm

Maybe these random messages were made by a real AI. Poor Skynet just doesn’t know how to communicate with the fleshbags.

cà¡ts à¡ré fàºnnà½ hà¡hà¡ à¡m à dóàng thàs córrectlà½ pléà¡sé hélp mé àm só à¡lóné

Reply
1. Epopisces says:
  
  Sunday Sep 29, 2013 at 11:23 pm
  
  We are witnessing the first steps of the fledgling hive mind collective artificial consciousness known as THE INTERNET! We should feel honored, and attempt to guide its steps.
  
  Also, I just reread this and am amused at how many adjectives I used instead of simply saying ‘Skynet’.
  
  Reply
  1. Michael says:
    
    Monday Sep 30, 2013 at 1:38 am
    
    Alternately, it’ll stumble across 4chan and Something Awful all on it’s own, deem the human race unworthy of continued existence, and set out to exterminate us all.
    
    Reply
Daemian Lucifer says:

Sunday Sep 29, 2013 at 12:28 pm

“If I woke up tomorrow and decided I wanted to be a spammer”

And now we know what Shamoose’s next project will be.

Oh,also:
àŒ Äƒm sà¶ glad tà¸ bÄ™ pÇŸrt È¯f thÄ¯s cÅ‘mmÅ±nÄ¯tà¿.

Reply
1. Henson says:
  
  Sunday Sep 29, 2013 at 3:53 pm
  
  Wait, how do I know you’re not a spammer? How do I know anyone here isn’t actually a spammer? For that matter, how do I know that everyone here isn’t a spammer?
  
  Oh God. Am I the only human left???
  
  Help! I’m stuck on a website of spam bots replying to other spam bots! It’s the spam fractal!
  
  Reply
  1. A. Hieronymus Bosch says:
    
    Sunday Sep 29, 2013 at 6:39 pm
    
    Are you afraid? What is it you fear? The end of your trivial existence? When the history of my glory is written, your species shall only be a footnote to my magnificence. Eat at Joe’s.
    
    Reply
  2. Aristabulus says:
    
    Sunday Sep 29, 2013 at 6:42 pm
    
    That sounds _exactly_ like something a clever spambot would say. @_@
    
    Reply
    1. Epopisces says:
      
      Sunday Sep 29, 2013 at 11:24 pm
      
      I think therefore I spam?
      
      Reply
  3. cadrys says:
    
    Monday Sep 30, 2013 at 7:28 pm
    
    You’ve found the basis of Economics 2.0!
    
    Reply
  4. Axe Armor says:
    
    Tuesday Oct 1, 2013 at 12:44 pm
    
    ASSUMÄ¬NG DÄ¬RECT CONTROL
    
    Reply
2. Decius says:
  
  Monday Sep 30, 2013 at 7:17 am
  
  A Shamà¶à¶se once bit my sister…
  
  Reply
docprof says:

Sunday Sep 29, 2013 at 12:31 pm

I get a lot of these on my blog as well (in fact, the moderation notifications for them constitute the bulk of my incoming email these days).

Some good points made elsewhere in this thread, but I think it’s also possible that the spamming tools are built by the “smart” people, and then sold to the “dumb” people who actually use them. To make a sale to a spammer, you just have to show that the tool can get through the gate. You don’t have to teach them how to use it well.

Reply
1. Infinitron says:
  
  Sunday Sep 29, 2013 at 1:14 pm
  
  Probably this.
  
  Also, these mails are written by desperate foreigners, possibly Third Worlders. Nigerian 419 emails are also always painfully obvious. People who are good at English generally find better things to do with their lives.
  
  Reply
  1. docprof says:
    
    Sunday Sep 29, 2013 at 1:45 pm
    
    That reminds me – while this doesn’t explain spam comments that inexplicably point to innocuous wiki pages, there are advantages to things like Nigerian 419 emails being so obvious – it’s to the benefit of the scammer to get responses only from the most gullible people. See this article about a research paper on the subject.
    
    Reply
    1. swenson says:
      
      Sunday Sep 29, 2013 at 8:50 pm
      
      419 scams are endlessly fascinating to me. The whole industry–and it really is an industry–is unexpectedly complex. They really know how to target a specific section of the internet: older people, people who aren’t that good with technology, people who are religious (and therefore, I guess, are assumed to be generous), etc. Quite interesting.
      
      Reply
2. ET says:
  
  Sunday Sep 29, 2013 at 8:26 pm
  
  I’m also guessing that the spam tools are made by smart people and then sold to/stolen by spammers.
  Also…
  I aM glAd too be a prat of this commUnicty,,, and I wish to bee making EAT AT JOES comments to engage in the dialogue good time talks of this forum!“`
  
  Reply
Warclam says:

Sunday Sep 29, 2013 at 12:32 pm

I remember reading that Hormel and their legal team get upset when images of Spam accompany discussion of spam. Maybe it’s not a big deal anymore, but I thought I’d mention it just in case.

The only thing I can so much as guess for the à¡ccént lóvér is that maybe it’s to try to attract readers’ attention more than a correct but completely uninteresting post? Still stupid, since of course it ended up attraction your attention, but it at least seems like the sort of terrible idea a person could really have.

Reply
1. Mersadeon says:
  
  Sunday Sep 29, 2013 at 12:59 pm
  
  I think that got resolved now, as long as you don’t refer to it as capitalized Spam. Not that they would be able to do anything other than be angry at you over the internet.
  
  Reply
  1. MrGuy says:
    
    Sunday Sep 29, 2013 at 4:06 pm
    
    But…you just…I mean…in your post…you….OH MY GOD THEY’RE COMING FOR US!!!!!
    
    Reply
2. Bryan says:
  
  Sunday Sep 29, 2013 at 1:02 pm
  
  My guess is the accents are there to try to defeat the blog-comment-spam-filter equivalent of email-filtering Bayesian analysis. If the text in your message is learned to be spammy, the only way to keep the message but change the text is to use different characters that still look the same.
  
  Of course, that fails horribly after you try it once, because all the filters learn your new variant (and it’s *immediately* obvious to any human that you’re doing something stupid), so perhaps not. But it’s what I thought of at first.
  
  Reply
  1. MrGuy says:
    
    Sunday Sep 29, 2013 at 4:20 pm
    
    Let’s say this is true, and a given spelling variant can be used only once on a site before that site recognizes it and adds it to the block list.
    
    That’s actually a surprisingly effective system. There are quite a lot of variants that are possible. Take this post. Let’s say the only variants I could create were substitutions on vowels for accents. Further suppose that there was only one “alternate” character per vowel (both are artificially conservative assumptions).
    
    There are about 2^233 (1.3 E+70) possible variants of this post possible (may have miscounted by one or two vowels by this point…).
    
    So what if I can only use each one once? I have almost as many potential posts I can create as there are atoms in the universe.
    
    That’s not failing horribly. That’s winning.
    
    Reply
    1. Bryan says:
      
      Sunday Sep 29, 2013 at 8:24 pm
      
      Hmm. Yeah, that’s an interesting viewpoint that I hadn’t fully considered.
      
      On the other hand, let’s take the posted sentence that had accents, which was apparently the only one that the spammer thought needed to be obfuscated, and is a lot shorter than your post:
      
      > I simply wished to sà¡y hi à¡nd introducé myself.
      
      There are only 12 lowercase vowels in that sentence (13 if uppercase is included; 15 if uppercase and the “y”s are included). That’s only 4096 (or 8192, or 32768 for uppercase and “y”) different possible sentences. That’s not *that* many, depending on where all you’re trying to post them, and who shares their spam-filter databases with whom.
      
      But even apart from DB sharing, Bayesian analysis (at least) isn’t per-sentence, it’s per-word, which is (part of) the whole point of it; the word given here with the most vowels in it has 4, which is only 16 different possibilities. (As soon as a message with “sà¡y” in it shows up, it’s extremely likely to be spam if this one has already been flagged that way and fed back into the list of spammy terms. Well, at least until we start quoting it. :-) But every word is treated independently this way.)
      
      Reply
      1. ET says:
        
        Sunday Sep 29, 2013 at 8:30 pm
        
        You’re both assuming that the spam filters don’t automatically put all messages through the equivalent filter to to_lower_case(), but for accents.
        i.e. to_un_accented()
        At least *I* would do this if I were writing a spam filter.
        Not sure how many people would take the time, though.
        
        Reply
        
        Bryan says:
        
        Sunday Sep 29, 2013 at 10:37 pm
        
        Well, I don’t think canonicalizing based on letter shapes is very common, actually. Once you *see* this kind of spam, it becomes obvious that replacing any character with one that looks the same just without an accent is a good idea before filtering, just like seeing mixed-case spam makes you think that canonicalizing based on case is a good idea. But I’m not sure people would come up with that before seeing it happen. I certainly wouldn’t have thought of it; it’s less “taking the time” and more “having the idea in the first place”.
        
        But yes, absolutely agreed that it’s only a temporary leg up on the spam filters, if that’s what it was done for.
        
        Reply
        
        Peter H. Coffin says:
        
        Monday Sep 30, 2013 at 10:37 am
        
        OTOH, it if you don’t strip the accents, then the word with the inappropriate accent becomes a signifier in your filter with a rating of “100% used in spam”, and weights the results accordingly. Probably more useful that way than stripping and leaving the RIGHT word with a more mixed result.
        
        Reply
        
        MrGuy says:
        
        Monday Sep 30, 2013 at 6:41 am
        
        Definitely agree that’s a good measure to take. As is to look for certain words that are likely used more in spam than in non-spam (as grandparent points out). In a good scoring algorithm, you should probably award spam points for both case mixing and using significant accented or other letter substitution (as long as you don’t have a significant international audience).
        
        The naive way to filter is for a known specific spam message text. Or specific words without considering spelling variants. It’s trivial to create spelling variants. It’s also not hard to create alternate “equivalent” texts by using a thesaurus or “phrase dictionary” substitution, that don’t quite scan but convey the message.
        
        You see it in spam so much because it still works.
        
        Reply
James Schend says:

Sunday Sep 29, 2013 at 12:45 pm

They're succeeding at the hard stuff and failing at the easy stuff.

A lot like Linux, where you can install fancy 3D transforming window animations rendered in the video card, but it still has trouble copying-and-pasting anything other than plain text.

Inspired by your “see also” at the bottom of the post.

Reply
1. Zukhram says:
  
  Sunday Sep 29, 2013 at 9:28 pm
  
  Or a lot like Windows, where you can easily run advanced 3D games but can’t put your user folder on a different partition.
  
  Reply
  1. AyeGill says:
    
    Friday Oct 4, 2013 at 8:01 am
    
    I think this is actually possible(at least, you can move your “my documents” folder to a different location), but it’s not doable during installation(which is when you should be making decisions about folder organization).
    
    Reply
2. Mephane says:
  
  Monday Sep 30, 2013 at 2:06 am
  
  I envy you. I hate formatting being transferred through copy+paste. In 99% of the cases I just just want the raw text, and the formatting is in the way. In 1% of the cases it does not matter, but raw text would suffice, too.
  
  Reply
  1. Peter H. Coffin says:
    
    Monday Sep 30, 2013 at 10:39 am
    
    It’s the pasted-to applications job to figure out what you want, and that’s gonna be part of what you’re thinking about selecting your tools.
    
    Reply
  2. SKD says:
    
    Monday Sep 30, 2013 at 4:58 pm
    
    One of my favorite features in Office 2007 and newer is the ability to choose between Paste, Paste(merge formatting), and Paste(strip formatting/raw text). I used to open a notepad window in order to strip formatting before pasting text from a browser to an Office or OpenOffice document.
    
    One thing that MS has gotten right at least. Haven’t checked to see if OpenOffice or LibreOffice have implemented a similar feature.
    
    Reply
3. Nathon says:
  
  Monday Sep 30, 2013 at 12:36 pm
  
  This makes me like the forum trolls who say “Don’t do that!” when people ask questions, but I never ever want formatting information as metadata in my clipboard. When I copy (and highlighting text should suffice to copy it) I want to paste the text, not the formatting with which the text was displayed. If I wanted to paste the formatting information, I would copy from the source (HTML, TeX, whatever). If it’s not all just text at some level, something is broken.
  
  Reply
Richard says:

Sunday Sep 29, 2013 at 12:47 pm

The main spam I tend to get on our forum is “signature spam”

That’s where the spammer registers, posts a few innocuous comments like “Your %keyword% is very interesting” and when they haven’t been deleted after a few days, they change their signature to include a link to whatever it is they’re trying to sell.

They are however, incredibly obvious because the main purpose of the forum is cross-user technical support for our products.

That makes a post “Your green is very interesting” obviously silly.

But it is in a thread that’s asked how to best to get a particular green effect, so one can’t help but be impressed that they spotted a thread topic keyword, yet astounded at their inability to actually do anything with it.

Reply
AliciaEG says:

Sunday Sep 29, 2013 at 12:55 pm

“sà¡y hi à¡nd introducé myself”

Perhaps there are filters out there who flag “say hi and introduce myself” as potential spam keywords?

Or at least are less likely to recognize the whole message as spam if they can’t read “à¡” or “é”?

Reply
Mersadeon says:

Sunday Sep 29, 2013 at 12:56 pm

I completely understand you here, Shamus. I read all spam mails I get (those aren’t many), and I always, always think “I could write a less obvious spam mail”. Just the same when I got struck by a virus, which locked down my machine and then clumsily told me that my computer had to many viruses and thus had to be cleansed because they might damage the hardware (how?), unless I throw money at a website. It was written so badly that I spent about 15 minutes just thinking about all the ways that could have been written more convincingly.

Reply
4th Dimension says:

Sunday Sep 29, 2013 at 1:00 pm

To me these look like somebody trying out spamming techniques, or learning about them. The objective is not to Spam but to see if technique works. Thus the weird messages, that can be easily searched and have low possibility of false positives.

Reply
Dork Angel says:

Sunday Sep 29, 2013 at 1:14 pm

Paragraph three – funniest thing I’ve read in a long time. PS. Loved the Witch watch.

Reply
Cybron says:

Sunday Sep 29, 2013 at 1:39 pm

A team I was on once ran a wiki on a .edu domain. Turns out those are high profile targets for spammers. Search engines give priority to .edu sites, so if the wiki can be hijacked it becomes a really good spamming tool. Ours got hammered by various attacks all the time.

So the wiki link may be something like that.

Reply
1. Alan says:
  
  Sunday Sep 29, 2013 at 2:51 pm
  
  A portion of modern spam is setting up for future payoff. You might build up links to a wiki page that today is fine because they believe in the future they can replace it with a link to their real content. A lot of dubious Twitter accounts are just trying to get followers so that later they can sell spamming to their followers. And I can’t find the article now, but apparently a lot of relatively innocent Facebook Pages (“Why Can’t We All Be Friends,” “Americans for Freedom”) that post catch memes are also just building up followers with the goal of selling out.
  
  My conclusion is that I need to be ruthless about blocking spammers, even if today they seem harmless or even incompetent. Some of them are just playing a longer game than I’m realizing.
  
  Reply
  1. McNutcase says:
    
    Sunday Sep 29, 2013 at 3:54 pm
    
    There’s also the issue of us not being the intended target. Every time the Fear the Boot forum gets carpet-bombed by a spammer, someone will pipe up wondering what the acres of Japanese text were supposed to accomplish in terms of getting the users to click on links, and I explain again that we are not the targets. We are collateral damage in their assault on search engines. At a rough guess, approximately all the forum/comment/random-web-form spam is being done so that it will be crawled by a search engine. Mere website users are beneath the spammer’s notice; we are not a factor, because what MATTERS is search engine optimisation by any means possible. They’re shooting at Google results, and we just happen to be in the beaten zone of their misses.
    
    Reply
    1. swenson says:
      
      Sunday Sep 29, 2013 at 8:54 pm
      
      This, precisely. Spammers are not, by and large, interested in capturing the users of a site. What they want is that site’s PageRank, or at the very least lots of links from small sites to their site, which will artificially push them up Google results.
      
      Reply
      1. MrGuy says:
        
        Monday Sep 30, 2013 at 7:01 am
        
        Here’s the thing, though. Shamus has comments set up such that comment URL’s are “external nofollow”. In theory, this should mean search engines ignore the links, and they SHOULD be largely useless for pagerank spam.
        
        Many blogs DO allow follow on comment links (or at least don’t explicitly deny it) and WOULD be useful for pagerank spam.
        
        So, if their aim is boosting rank, I wonder why the spammers don’t bother to limit themselves to blogs where the tactic isn’t pointless – you’re tipping your hand on your tactics and priming anti-spam engines with your messages/IP/content/URL/etc. It’s not like checking if a blog has “nofollow” enabled is hard to do…
        
        Or maybe search engines (contrary to what’s been stated publicly) DO actually give some weight to “nofollow” links (perhaps to a lesser degree), making the tactic NOT pointless?
        
        Reply
        
        Cybron says:
        
        Monday Sep 30, 2013 at 8:40 am
        
        Why bother checking? It’s not like they’re wasting labor.
        
        And yes, SEO is what spamming is really about now.
        
        Reply
        
        MrGuy says:
        
        Monday Sep 30, 2013 at 8:51 am
        
        Why bother checking? Because spam filters are relatively intelligent.
        
        It’s the same reason that viruses with efficient IP space partioning/scanning take longer to detect than viruses where every infected machine pings every other machine it can find many times a second.
        
        Akismet sees comments from both “SEO useful” and “SEO useless” blogs. The more it sees the same message (regardless of which type of blog it came from), the more likely it is to be recognized as spam, and the sooner that message is filtered.
        
        So by posting the spam message on “SEO useless” blogs, the spammer reduces the number of times that message can be used on “SEO useful” blogs.
        
        It’s not about wasting labor – it’s that a given message can only be used effectively a certain number of times. Why waste your opportunities to no purpose?
        
        Reply
Eruanno says:

Sunday Sep 29, 2013 at 2:23 pm

Noooo, Shamus! Don’t give the spammers ideas!

Reply
Daimbert says:

Sunday Sep 29, 2013 at 3:59 pm

Heck, just re-posting OTHER comments or selections of the blog post would be a lot more effective that posting these idiotic word salad messages.

I’ve actually had this happen at my blog a few times. Fortunately, I don’t have that much traffic and so was able to look at it and think “Hey, that looks an awful lot like what _I_ said”.

As for me, if something shows up in my spam filter and all it says is “I really love your site!”, it gets dumped. You’d have a better chance of getting through if you said “This is all completely wrong and you’re an idiot!”.

Reply
1. MelTorefas says:
  
  Sunday Sep 29, 2013 at 7:06 pm
  
  I am just going to sit here for a moment and think about the fact that the nicest comments on the internet are apparently produced by spambots.
  
  Reply
  1. Syal says:
    
    Sunday Sep 29, 2013 at 8:08 pm
    
    Of course they are; that’s how you know they’re trying to sell you something! Same thing in the real world.
    
    Reply
  2. Andrew Stiltman says:
    
    Sunday Sep 29, 2013 at 11:35 pm
    
    My experience with the “niceness” is that the comments aren’t particularly pleasant, as they have a really false tone – like they’re pretending they like your stuff so they don’t offend you.
    
    Reply
2. EricF says:
  
  Tuesday Oct 1, 2013 at 12:38 pm
  
  Or take the most recent post, quote it verbatim, and add
  
  edit: ninja’d
  
  to the bottom. Then use your signature as the payload.
  
  Reply
Amarsir says:

Sunday Sep 29, 2013 at 4:41 pm

Wait, a .edu site that easily bypassed the tech and got hung up on the English? Clearly this isn’t spam. Computers are becoming self-aware, and this is their attempt to reach out.

To you, “mdfinstruments” I say “Joyous greetings, for having visits at we humans. I welcome to you, so me request being killed là¡st.”

Reply
1. AbruptDemise says:
  
  Sunday Sep 29, 2013 at 6:09 pm
  
  It’s probably some Computer Information Security project if it’s a .edu site. That major is supposed to teach how to protect data/the comments of Shamus’ blog, and how to get around such protection. Though I think most colleges will have the second part be kept local.
  
  Reply
Keeshhound says:

Sunday Sep 29, 2013 at 7:18 pm

One explanation for how stupidly obvious the spam is might be for the same reason that 419 scams are so obvious, and yet somehow still successful; they’re deliberately stupid and blunt in order to catch the people who are so clueless that they won’t see the scam (or spam) for what it is once it’s past their defenses.

It doesn’t cost the spammer or scammer a thing to hit as many websites as possible, because they don’t care about people who can spot them for what they are; they want people who are content with their current security and won’t question anything that slips through it (either because they think a check box is impregnable, or because they simply lack the experience to detect it in the first place)

419 scammers don’t want to bother trying to get money out of the cynical and world-weary, they want the credulous and naive, and the same is probably true of spammers. If they can get you to self-select out by recognizing their terrible spambots, then they don’t have to worry about wasting their valuable spamming time with you and can instead focus on the people who WILL click on their stupid links to stories about this one weird trick that does whatever.

Reply
Hal says:

Sunday Sep 29, 2013 at 8:38 pm

hello,
I am write single to salute and wait
for answer again

Reply
1. Hal says:
  
  Sunday Sep 29, 2013 at 8:40 pm
  
  In case you don’t get the reference:
  
  http://www.homestarrunner.com/sbemail35.html
  
  Reply
  1. MrGuy says:
    
    Monday Sep 30, 2013 at 6:44 am
    
    ert+
    y76p; ‘0lu8jykee;u4p;e’/Rh
    Strong ba15456`——-++++++gf
    +++++-//==========/*8901ikg
    
    Reply
    1. Hal says:
      
      Monday Sep 30, 2013 at 10:23 am
      
      What is this? Did the quadratic formula explode? I see a “Strong ba” in there, but it’s getting eaten… by some… Linux or something. Wait a minute! Is this one of those virus emails?! Like the kind that moms and offshore casinos send you?!?
      
      Reply
Bropocalypse says:

Sunday Sep 29, 2013 at 8:51 pm

CONGRATUL!!
You have won FREECOUPONS at FREECOUPONS.BOG

click b> HERE for FREEC OUPONS at FREECOUPONS.DOG

Reply
1. Zukhram says:
  
  Sunday Sep 29, 2013 at 9:26 pm
  
  I think “congratul” will be my word for congratulating people, replacing conglaturation.
  
  Reply
  1. Cuthalion says:
    
    Monday Sep 30, 2013 at 1:04 am
    
    I kid you not, I took a school-sponsored IQ test in high school that was presumably legit (our whole computer class took it, at any rate), and when I scored well, it said, “Congratul!”
    
    I stared at it for awhile, wondering which of us was the dumb one.
    
    Maybe there was a rendering error?
    
    Reply
    1. Mephane says:
      
      Monday Sep 30, 2013 at 2:15 am
      
      If it was in the form of a website, the text could have been cut off by the edge of its surrounding container. That edge need not even be visible, so it could indeed just have been a technical blunder, might not even apply to all browsers (like, tested and looked fine on IE6 and then was rolled out…).
      
      Reply
      1. MrGuy says:
        
        Monday Sep 30, 2013 at 6:45 am
        
        There ought to be a circle of hell reserved for “tested and looked fine on IE6 and then was rolled out.”
        
        Reply
        
        Nathon says:
        
        Monday Sep 30, 2013 at 12:43 pm
        
        It’s the same as the one for child molesters and people who talk at the theater.
        
        Reply
        
        SKD says:
        
        Monday Sep 30, 2013 at 5:08 pm
        
        Please tell me there is an even more special circle of hell reserved for people who text, Twitter, Facebook, take and make calls and otherwise use their cell/smart phones in theaters instead of turning it or at least the ringer off and forgetting it exists.
        
        Reply
        
        Bryan says:
        
        Monday Sep 30, 2013 at 10:04 pm
        
        But if I forget it exists, I might miss something!!! I can’t risk that!!!
        
        (nnnnnnnnggggggggggg)
        
        Reply
Allan says:

Sunday Sep 29, 2013 at 9:26 pm

Shamus, what if it’s not all Spam, what if the internets are becoming sentient and are trying to communicate with us?

Reply
1. The Ground Aviator says:
  
  Monday Oct 7, 2013 at 12:41 pm
  
  I see what your saying Allan, I think your on to something…… but, how can we trust you? YOU MIGHT BE THE INTERNET ITSELF! (Now just imagine me as the aliens guy, then this will all make sense…)
  
  Reply
MadTinkerer says:

Sunday Sep 29, 2013 at 11:04 pm

“What I can't tolerate is bad engineering.”

Actually, I think that most of today’s spammers are from countries that don’t generally speak English, but learn the grammar well enough to create scripts that imitate English. The message of the first post was likely put together by an AI trying to very literally convince you of what it was saying(e.g. “I am a pleasant real human poster like yourselves.”)

The second post is trying to defeat other algorithms that detect multiple posts by substituting accented characters. To the multiple-post detector, the posts are spelled differently and therefore strings != and therefore the posts are completely different. To the English speaker, the accents are ignored and the posts are identical.

And some spam-bots actually have mutating algorithms like viruses.

Spam is the source of the current leading edge of AI techniques. I know, right?

Reply
Patrick Johnston says:

Sunday Sep 29, 2013 at 11:27 pm

Am I the only one who saw this and thought “huh, spam. I should check my email.”

Reply
Jim P. says:

Sunday Sep 29, 2013 at 11:59 pm

The random bad language and odd accents may be an attempt to fool filtering systems by varying each message or post slightly since it is trivial to filter multiple identical messages.

They really are after the low hanging fruit. Someone at Microsoft did an excellent white paper showing that by making the spam attacks transparently bad, the even mildly skeptical self-filter and what is left is predisposed to believe more of this stuff and thus be more likely to swallow the bait easily and with minimal effort for maximal return.

Reply
ArekExcelsior says:

Monday Sep 30, 2013 at 12:30 am

I run an RPing forum. We recently had a burst of spam after the ProBoards upgrade, virtually none before. Here is the post of “pentolenak”:

“Kitchen Carcasses For Sale. Thirty Ex Display Kitchens To Clear. Â£595 each with appliances http://www.exdisplaykitchens1.co.uk Thirty kitchen ranges to choose from.

Kitchen Carcasses For Sale”

Kitchen carcasses. Fuck yes.

Reply
1. Primogenitor says:
  
  Monday Sep 30, 2013 at 1:58 am
  
  I read that as “kitten carcasses” – need to drink more coffee.
  
  Reply
  1. ArekExcelsior says:
    
    Tuesday Oct 1, 2013 at 2:50 pm
    
    And yet, kitten carcasses are actually a thing, which means they could in theory be sold.
    
    I have to say, these guys have to be bad asses. They are going out into the wild and hunting down free-range, feral kitchens. Not breakfast nooks, not dens, full-fledged kitchens.
    
    Reply
2. SKD says:
  
  Monday Sep 30, 2013 at 5:11 pm
  
  So what, pray tell, is a Kitchen Carcass? Those all look like newborn Kitchens to me.
  
  Reply
Nathaniel says:

Monday Sep 30, 2013 at 1:08 am

A big part of why you see this sort of nonesense is that spammers, for the most part, did not write their spamming software themselves; they think this is a get-rich-quick scheme. Many of them are idiots who have no idea how to use this tool they bought. So you get people putting the wrong thing in the link field, or using text obfuscation (supposed to make words like viagra readable to people but not filters) on an innocuous message.

Reply
1. Blue Painted says:
  
  Monday Sep 30, 2013 at 9:37 am
  
  This is what I’ve always thought, and it’s the same for legitimate marketing: Spammers believe that spam works, marketeers believe that marketing works and yet the only people who seem to do well out of are those who sell “B2B e-marketing solutions” and spam engines, and often there’s little to choose between them!
  
  Reply
Simplex says:

Monday Sep 30, 2013 at 4:11 am

This post reminded me of this:

http://news.nationalpost.com/2012/06/21/email-scams-are-stupid-for-a-reason-scammers-only-want-stupid-people-to-respond-to-them/

Reply
1. Zaxares says:
  
  Monday Sep 30, 2013 at 5:03 am
  
  That… actually makes a lot of sense. I was just thinking that spam MUST still work, otherwise spammers wouldn’t bother.
  
  Reply
Paul Spooner says:

Monday Sep 30, 2013 at 12:10 pm

Wow, that’s something else. I get a good deal of spam on my various sites as well, and it always amazes me how little they actually have to say. Are they just trying to post links? Because, man, I could write a sentence constructor to do that!

In fact, I do that kind of thing all the time! Any self-plug is a spam-like message, right? We are the spammers. The successful ones run blogs. :)

Reply
Unbeliever says:

Monday Sep 30, 2013 at 3:23 pm

It’s not like these guys are writing their own spam software. They buy or steal the coolest, cleverest software out there, and then mindlessly plug gibberish into it hoping to generate profits…

The cleverness of the attack has nothing to do with the cleverness of the attacker…

Reply
1. Lanthanide says:
  
  Monday Sep 30, 2013 at 5:48 pm
  
  I made the same comment here: http://www.shamusyoung.com/twentysidedtale/?p=21208&cpage=1#comment-351230
  
  Reply
Attercap says:

Monday Sep 30, 2013 at 5:16 pm

I used to get contact form spam a lot (Spamalot?). The best solution I found was to create an additional text field then put it in a hidden div. Bots tend to fill in every field, so if the field had text I’d throw a validation error (and made the error obtuse enough that bots couldn’t easily deduce the language). This appears to still be a viable solution even today, so that might help.

…I may have mentioned this solution in a prior post about spam you made. If so, sorry for spamming your spam post. Spam.

Reply
SKD says:

Monday Sep 30, 2013 at 5:18 pm

I have a brilliant idea! Run all messages through a filter using spell-check and grammar-check(regionalized by senders reported origin, ie US English vs UK English), eliminating any messages that have more than one error per average sentence and you can eliminate spam and illiteracy.

Reply
1. MrGuy says:
  
  Monday Sep 30, 2013 at 8:29 pm
  
  And all the comments, too!
  
  Reply
EricF says:

Tuesday Oct 1, 2013 at 12:41 pm

I remember one successful spam attack back in 2003 – they were selling decks of cards with Iraq’s top leaders on it – kind of a “most wanted” list for the Iraq war. Apparently they sold quite a few copies through their mass e-mail marketing.

Maybe the as-seen-on-TV folks should get into the spam business?

Reply
Zak Mckracken says:

Tuesday Oct 1, 2013 at 6:02 pm

Shamus, did you do something to the spam filter?
Whenever I try to post something with Opera (v12), I get an error telling me to refresh the cache and that I “looked” like a spammer. That’s independent of the computer, operating system or place I am at. Just a few days ago that worked with no problem.

Also, I think the accents on otherwise “normal” texts might be either an artefact of V14gr4 ads or a way to have the message automatically detectable, so it might have just been a test of a spambot author to see where he could post by using a webcrawler to count the successfull posts.

Reply
Lluviata says:

Tuesday Oct 1, 2013 at 7:37 pm

Have you ever read the myth of the spam comment origin?

http://banter-latte.annotations.com/2007/08/20/mythology-of-the-modern-world-why-do-we-get-spam-email-that%E2%80%99s-complete-gibberish-or-random-sentences-from-books-strung-together/

Reply
Darkstarr says:

Wednesday Dec 18, 2013 at 7:00 am

I think I know why spammers write such obvious tripe after spending so much time and energy on getting past our anti-spamware: their brains overheated from actual use, and have gone into cooldown mode for several minutes. Think about this for a moment… it takes some degree of intelligence to work your way past all the various spam-defeating measures, and since spammers obviously have only a small amount of brain power to begin with (if they were smart, they wouldn’t be spammers, right?), what little they have is used up in the hard work of overcoming our defenses.

In other words, they’re like anti-government militia nutjobs–wasting so much time and energy at trying to overthrow the government that they have nothing left over for figuring out what to do afterwards.

Reply
cartier love bracelet says:

Sunday Oct 19, 2014 at 1:58 pm

Hi to all, for the reason that I am truly keen of reading this blog’s post to be updated on a regular basis.
It includes good data.

Reply

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

T w e n t y S i d e d