Spam Script

By Shamus Posted Wednesday Feb 17, 2016

Filed under: Notices 94 comments

I get a lot of spam on this blog. This place is quasi-popular and pretty dang old by internet standards. This means it’s both an attractive target for spammers and there’s been plenty of time for this site to get added to everyone’s rolodex.

There are several layers of protection between the idiots and your eyeballs. First are the crude tools: IP blocking for particularly naughty IP rangesA year ago I tried removing some of these, and within a few days got crushed by unwanted traffic. I don’t know if it was a flood of spam, a low-yield DOS attack, or WHAT. But I needed my hosting provider to help me out and re-block the troublemakers. I’m less interested in lifting IP bans now. Sorry if you’re stuck in one of those nasty IP ranges and you can’t even reach the site to read this apology.. Then there’s filtering for clearly red-flag behaviors, like someone trying to leave comments who never loaded the page they’re supposedly commenting on, or people leaving a comment every few seconds. After that is the keyword checking, and filtering out people who post tons of links.

The final layer of spam filtering is… me.

Some filters delete the comment instantly. Some filters put the spam into the spam folder, where I’ll never see it unless I go looking for it. Some filters put suspect messages into moderation, where I’ll see them and have to choose to approve or delete them.

Spam is kind of like the weather. It varies in intensity. Some days I’ll only see a few, and some days I’ll see dozens. But over the last couple of years the spam has settled into a very predictable pattern. I never see porn links these days. Piracy stuff is now super-rare. Brute-force word salad messages are either an abandoned technique, or the filters have gotten really good at catching them, because I never see those anymore. The messages that are just dozens on links don’t get through. The only thing I see these days are the plausible-but-fake comment spam, which are just a little too subtle for the spam filters to detect. These are messages of semi-coherent English with no links. Usually the given name is the thing they’re selling, so that (if I allowed the spam through) clicking on the name would take you to their site.

The messages usually look something like this:

Ugg Boots July 30, 2015 at 7:06 am

Hi, I’m visit your site today and was surprised at what I found? You have the skill at conveying the Subject at hand. I would like to reply more about this topic and hope will to in the future. All the best,

This is clearly nonsense, but it’s enough like proper English to fool most spam filters. I always assumed there were just a few messages like this and spam programs sent them at random. But apparently (as we’ll see below) they’re assembled from phrase lists.

Hilariously, bots are often mis-configured so the website link doesn’t get filled in, meaning even if they got past the filter, got past me, and someone actually decided to click through to their website for some unimaginable reason, the link wouldn’t take them anywhere. This is amazingly common. In fact, of the spam I see, I’d say at least half have a missing or malformed URL, and thus no payload whatsoever.

But last night someone made a mistake I’d never seen before. Instead of sending a spam message, they sent the data used to build the spam messages. (And also no payload. It’s amateur night!) I’ve posted the complete source below.

It’s pretty amazing to read this, because it’s basically every spam message I’ve read in the past year, all smeared to gether into a big pile of nonsense. It’s totally alien and yet strangely familiar.

We can see how the script works. When it comes to a set of brackets, it treats the contents like a list, separated by the vertical bar | character. It picks randomly from that list. It’s recursive, so it looks for brackets within those brackets, and so on.

I love how they coded this wonderful, easily-extensible system for making endlessly permutating messages, but then they crammed it full of shitty English that would never get past a human being. And then forgot to add a payload to the resulting mess. Imagine if some genius invented a machine that can perfectly copy any money you put into it, and then some idiot comes along and feeds it Monopoly money, and then tries to use the fake-fake money to buy something inherently worthless anyway, like used lottery tickets. It’s such a hilarious blend of creativity, hubris, laziness, incompetence, and pointless stupidity.

Either way, you end up having to clean up a huge mess made by someone who has decided to waste both your time and theirs. It’s a shame, but it’s a smaller part of my job every year. Progress is slow, but it does feel like we’re winning.

Here’s the source data:

{I have|I've} been {surfing|browsing} online more than {three|3|2|4} hours today, yet I never found any 
interesting article like yours. {It's|It is} pretty worth enough for me.
{In my opinion|Personally|In my view}, if all {webmasters|site owners|website owners|web owners} and 
bloggers made good content as you did, the {internet|net|web} will be {much more|a lot more} useful than ever before.|
I {couldn't|could not} {resist|refrain from} commenting.
{Very well|Perfectly|Well|Exceptionally well} written!|
{I will|I'll} {right away|immediately} {take hold of|grab|clutch|grasp|seize|snatch} your {rss|rss feed} as I {can not|can't} {in finding|find|to find} your 
{email|e-mail} subscription {link|hyperlink} or {newsletter|e-newsletter} service.
Do {you have|you've} any? {Please|Kindly} {allow|permit|let} me {realize|recognize|understand|recognise|know} 
{so that|in order that} I {may just|may|could} subscribe.
{It is|It's} {appropriate|perfect|the best} time to make some plans for the future 
and {it is|it's} time to be happy. {I have|I've} read this post and if I could I {want to|wish to|desire to} suggest you {few|some} interesting things 
or {advice|suggestions|tips}. {Perhaps|Maybe} you {could|can} write next articles 
referring to this article. I {want to|wish to|desire to} read {more|even more} things about it!|
{It is|It's} {appropriate|perfect|the best} time to 
make {a few|some} plans for {the future|the longer term|the long run} and {it is|it's} time to be happy.
{I have|I've} {read|learn} this {post|submit|publish|put up} and if I {may just|may|could} 
I {want to|wish to|desire to} {suggest|recommend|counsel} you 
{few|some} {interesting|fascinating|attention-grabbing} {things|issues} or {advice|suggestions|tips}.
{Perhaps|Maybe} you {could|can} write {next|subsequent} articles {relating to|referring to|regarding} this article.
I {want to|wish to|desire to} {read|learn} {more|even more} {things|issues} {approximately|about} it!|
{I have|I've} been {surfing|browsing} {online|on-line} {more than|greater than} {three|3} hours {these days|nowadays|today|lately|as of 
late}, {yet|but} I {never|by no means} {found|discovered} any {interesting|fascinating|attention-grabbing} article like yours.
{It's|It is} {lovely|pretty|beautiful} {worth|value|price} {enough|sufficient} for me.
{In my opinion|Personally|In my view}, if all {webmasters|site owners|website owners|web owners} and bloggers made 
{just right|good|excellent} {content|content material} as {you did|you probably did},
the {internet|net|web} {will be|shall be|might be|will probably be|can be|will likely 
be} {much more|a lot more} {useful|helpful} than ever 
Ahaa, its {nice|pleasant|good|fastidious} {discussion|conversation|dialogue} 
{regarding|concerning|about|on the topic of} this {article|post|piece of 
writing|paragraph} {here|at this place} at this {blog|weblog|webpage|website|web 
site}, I have read all that, so {now|at this time} me also commenting {here|at 
this place}.|
I am sure this {article|post|piece of writing|paragraph} has 
touched all the internet {users|people|viewers|visitors}, its 
really really {nice|pleasant|good|fastidious} {article|post|piece of writing|paragraph} on building up new 
{blog|weblog|webpage|website|web site}.|
Wow, this {article|post|piece of writing|paragraph} is {nice|pleasant|good|fastidious}, 
my {sister|younger sister} is analyzing {such|these|these 
kinds of} things, {so|thus|therefore} I am going to {tell|inform|let 
know|convey} her.|
{Saved as a favorite|bookmarked!!}, {I really like|I like|I love} {your blog|your site|your web 
site|your website}!|
Way cool! Some {very|extremely} valid points!
I appreciate you {writing this|penning this} {article|post|write-up} {and 
the|and also the|plus the} rest of the {site is|website is} {also very|extremely|very|also really|really} good.|
Hi, {I do believe|I do think} {this is an excellent|this is 
a great} {blog|website|web site|site}. I stumbledupon it ;) {I will|I am going to|I'm going to|I may} {come back|return|revisit} {once 
again|yet again} {since I|since i have} {bookmarked|book marked|book-marked|saved as a 
favorite} it. Money and freedom {is the best|is the greatest} way to 
change, may you be rich and continue to {help|guide} {other people|others}.|
Woah! I'm really {loving|enjoying|digging} the template/theme of this {site|website|blog}.
It's simple, yet effective. A lot of times it's {very hard|very difficult|challenging|tough|difficult|hard} to get 
that "perfect balance" between {superb usability|user friendliness|usability} and {visual appearance|visual appeal|appearance}.
I must say {that you've|you have|you've} done a {awesome|amazing|very good|superb|fantastic|excellent|great} job 
with this. {In addition|Additionally|Also}, the blog loads {very|extremely|super} {fast|quick} for me on {Safari|Internet explorer|Chrome|Opera|Firefox}.
{Superb|Exceptional|Outstanding|Excellent} Blog!|
These are {really|actually|in fact|truly|genuinely} {great|enormous|impressive|wonderful|fantastic} ideas in {regarding|concerning|about|on the topic of} blogging.
You have touched some {nice|pleasant|good|fastidious} {points|factors|things} here.
Any way keep up wrinting.|
{I love|I really like|I enjoy|I like|Everyone loves} what you guys {are|are usually|tend to be} up too.
{This sort of|This type of|Such|This kind of} clever work and {exposure|coverage|reporting}!
Keep up the {superb|terrific|very good|great|good|awesome|fantastic|excellent|amazing|wonderful} 
works guys I've {incorporated||added|included} you guys to {|my|our||my 
personal|my own} blogroll.|
{Howdy|Hi there|Hey there|Hi|Hello|Hey}! Someone in my {Myspace|Facebook} group shared this {site|website} with us so I 
came to {give it a look|look it over|take a look|check it 
out}. I'm definitely {enjoying|loving} the information. I'm {book-marking|bookmarking} 
and will be tweeting this to my followers! {Terrific|Wonderful|Great|Fantastic|Outstanding|Exceptional|Superb|Excellent} blog and {wonderful|terrific|brilliant|amazing|great|excellent|fantastic|outstanding|superb} {style and design|design and style|design}.|
{I love|I really like|I enjoy|I like|Everyone loves} what you guys {are|are usually|tend to be} 
up too. {This sort of|This type of|Such|This kind of} clever work and {exposure|coverage|reporting}!
Keep up the {superb|terrific|very good|great|good|awesome|fantastic|excellent|amazing|wonderful} works guys I've {incorporated|added|included} you guys to 
{|my|our|my personal|my own} blogroll.|
{Howdy|Hi there|Hey there|Hi|Hello|Hey} would you mind {stating|sharing} 
which blog platform you're {working with|using}?
I'm {looking|planning|going} to start my own blog {in the near future|soon} but I'm having a {tough|difficult|hard} time {making a decision|selecting|choosing|deciding} 
between BlogEngine/Wordpress/B2evolution and Drupal.
The reason I ask is because your {design and style|design|layout} seems different then most blogs and I'm looking for something 
{completely unique|unique}. P.S {My apologies|Apologies|Sorry} for {getting|being} off-topic but I had 
to ask!|
{Howdy|Hi there|Hi|Hey there|Hello|Hey} would you mind letting me know which {webhost|hosting company|web host} you're {utilizing|working with|using}?
I've loaded your blog in 3 {completely different|different} {internet browsers|web browsers|browsers} and 
I must say this blog loads a lot {quicker|faster} then most.
Can you {suggest|recommend} a good {internet hosting|web hosting|hosting} provider at a 
{honest|reasonable|fair} price? {Thanks a lot|Kudos|Cheers|Thank 
you|Many thanks|Thanks}, I appreciate it!|
{I love|I really like|I like|Everyone loves} it {when people|when individuals|when folks|whenever 
people} {come together|get together} and share {opinions|thoughts|views|ideas}.
Great {blog|website|site}, {keep it up|continue the good work|stick with 
Thank you for the {auspicious|good} writeup.
It in fact was a amusement account it. Look advanced to {far|more} added 
agreeable from you! {By the way|However}, how {can|could} we 
{Howdy|Hi there|Hey there|Hello|Hey} just wanted to give you a 
quick heads up. The {text|words} in your {content|post|article} seem to be running off 
the screen in {Ie|Internet explorer|Chrome|Firefox|Safari|Opera}.
I'm not sure if this is a {format|formatting} issue or something to 
do with {web browser|internet browser|browser} compatibility but I {thought|figured} I'd post to let you 
know. The {style and design|design and style|layout|design} look great though!
Hope you get the {problem|issue} {solved|resolved|fixed} soon. {Kudos|Cheers|Many thanks|Thanks}|
This is a topic {that is|that's|which is} {close to|near to} my heart...
{Cheers|Many thanks|Best wishes|Take care|Thank 
you}! {Where|Exactly where} are your contact details though?|
It's very {easy|simple|trouble-free|straightforward|effortless} to 
find out any {topic|matter} on {net|web} as compared to 
{books|textbooks}, as I found this {article|post|piece of 
writing|paragraph} at this {website|web site|site|web page}.|
Does your {site|website|blog} have a contact page? I'm having {a tough time|problems|trouble} locating it but, I'd like to {send|shoot} you an {e-mail|email}.
I've got some {creative ideas|recommendations|suggestions|ideas} for your blog you might be interested 
in hearing. Either way, great {site|website|blog} and I look forward to seeing it {develop|improve|expand|grow} over time.|
{Hola|Hey there|Hi|Hello|Greetings}! I've been {following|reading} your {site|web site|website|weblog|blog} for {a long time|a while|some time} now and finally got 
the {bravery|courage} to go ahead and give you a shout out from {New Caney|Kingwood|Huffman|Porter|Houston|Dallas|Austin|Lubbock|Humble|Atascocita} {Tx|Texas}!
Just wanted to {tell you|mention|say} keep up the {fantastic|excellent|great|good} {job|work}!|
Greetings from {Idaho|Carolina|Ohio|Colorado|Florida|Los 
angeles|California}! I'm {bored to tears|bored to death|bored} at work so I decided to {check 
out|browse} your {site|website|blog} on my iphone during lunch break.
I {enjoy|really like|love} the {knowledge|info|information} you {present|provide} here and 
can't wait to take a look when I get home. I'm {shocked|amazed|surprised} 
at how {quick|fast} your blog loaded on my {mobile|cell 
phone|phone} .. I'm not even using WIFI, just 3G 
.. {Anyhow|Anyways}, {awesome|amazing|very good|superb|good|wonderful|fantastic|excellent|great} {site|blog}!|
Its {like you|such as you} {read|learn} my {mind|thoughts}!
You {seem|appear} {to understand|to know|to grasp} {so much|a lot} {approximately|about} this, {like you|such as you} wrote the {book|e-book|guide|ebook|e book} in it or something.
{I think|I feel|I believe} {that you|that you simply|that you just} {could|can} do with {some|a few} {%|p.c.|percent} to {force|pressure|drive|power} the message {house|home} {a bit|a little bit}, {however|but} {other 
than|instead of} that, {this is|that is} {great|wonderful|fantastic|magnificent|excellent} 
blog. {A great|An excellent|A fantastic} read.
{I'll|I will} {definitely|certainly} be back.|
I visited {multiple|many|several|various} {websites|sites|web 
sites|web pages|blogs} {but|except|however} the audio {quality|feature} for 
audio songs {current|present|existing} at this {website|web site|site|web page} is {really|actually|in fact|truly|genuinely} 
{Howdy|Hi there|Hi|Hello}, i read your blog {occasionally|from time to time} and i own a similar one and i was 
just {wondering|curious} if you get a lot of spam {comments|responses|feedback|remarks}?
If so how do you {prevent|reduce|stop|protect against} it, any plugin or anything you can {advise|suggest|recommend}?
I get so much lately it's driving me {mad|insane|crazy} so any {assistance|help|support} is very much appreciated.|
Greetings! {Very helpful|Very useful} advice {within this|in this particular} {article|post}!
{It is the|It's the} little changes {that make|which will make|that produce|that will make} {the biggest|the largest|the greatest|the most important|the most significant} changes.
{Thanks a lot|Thanks|Many thanks} for sharing!|
{I really|I truly|I seriously|I absolutely} love {your blog|your site|your website}..
{Very nice|Excellent|Pleasant|Great} colors & theme. Did you {create|develop|make|build} {this website|this site|this web site|this amazing site} 
yourself? Please reply back as I'm {looking to|trying to|planning to|wanting 
to|hoping to|attempting to} create {my own|my very own|my own personal} 
{blog|website|site} and {would like to|want to|would love to} {know|learn|find out} where you got this 
from or {what the|exactly what the|just what the} theme {is called|is named}.
{Thanks|Many thanks|Thank you|Cheers|Appreciate it|Kudos}!|
{Hi there|Hello there|Howdy}! This {post|article|blog post} {couldn't|could not} be written {any better|much 
better}! {Reading through|Looking at|Going through|Looking through} this {post|article} reminds me of my previous roommate!
He {always|constantly|continually} kept {talking about|preaching about} 
this. {I will|I'll|I am going to|I most certainly will} {forward|send} {this article|this information|this post} to him.
{Pretty sure|Fairly certain} {he will|he'll|he's going to} {have a good|have a very good|have a great} read.
{Thank you for|Thanks for|Many thanks for|I appreciate you for} sharing!|
{Wow|Whoa|Incredible|Amazing}! This blog looks {exactly|just} like my old 
one! It's on a {completely|entirely|totally} different {topic|subject} but it has pretty much the 
same {layout|page layout} and design. {Excellent|Wonderful|Great|Outstanding|Superb} choice of colors!|
{There is|There's} {definately|certainly} {a lot to|a great deal to} {know about|learn about|find out about} this {subject|topic|issue}.
{I like|I love|I really like} {all the|all of the} 
points {you made|you've made|you have made}.|
{You made|You've made|You have made} some {decent|good|really good} points there.
I {looked|checked} {on the internet|on the web|on the 
net} {for more info|for more information|to find 
out more|to learn more|for additional information} about the issue and found {most individuals|most people} will go along with your views on {this 
website|this site|this web site}.|
{Hi|Hello|Hi there|What's up}, I {log on to|check|read} your {new stuff|blogs|blog} {regularly|like every week|daily|on a 
regular basis}. Your {story-telling|writing|humoristic} style is {awesome|witty}, keep {doing 
what you're doing|up the good work|it up}!|
I {simply|just} {could not|couldn't} {leave|depart|go away} your 
{site|web site|website} {prior to|before} suggesting that I {really|extremely|actually} {enjoyed|loved} {the standard|the usual} {information|info} {a person|an individual} {supply|provide} {for your|on your|in your|to your} {visitors|guests}?
Is {going to|gonna} be {back|again} {frequently|regularly|incessantly|steadily|ceaselessly|often|continuously} {in order to|to} 
{check up on|check out|inspect|investigate cross-check} new posts|
{I wanted|I needed|I want to|I need to} to thank you for this {great|excellent|fantastic|wonderful|good|very good} 
read!! I {definitely|certainly|absolutely} {enjoyed|loved} every {little bit of|bit of} it.
{I have|I've got|I have got} you {bookmarked|book marked|book-marked|saved as a favorite} {to check out|to look at} 
new {stuff you|things you} post...|
{Hi|Hello|Hi there|What's up}, just wanted to {mention|say|tell you},
I {enjoyed|liked|loved} this {article|post|blog post}.
It was {inspiring|funny|practical|helpful}.
Keep on posting!|
{Hi there|Hello}, I enjoy reading {all of|through} your 
{article|post|article post}. I {like|wanted} to write a little comment 
to support you.|
I {always|constantly|every time} spent my half an hour to read this {blog|weblog|webpage|website|web site}'s {articles|posts|articles or reviews|content} 
{everyday|daily|every day|all the time} along with a {cup|mug} of coffee.|
I {always|for all time|all the time|constantly|every time} emailed this 
{blog|weblog|webpage|website|web site} post page to all my {friends|associates|contacts}, {because|since|as|for the 
reason that} if like to read it {then|after that|next|afterward} my {friends|links|contacts} 
will too.|
My {coder|programmer|developer} is trying to {persuade|convince} me to move 
to .net from PHP. I have always disliked the idea because of the {expenses|costs}.
But he's tryiong none the less. I've been using {Movable-type|WordPress} on {a number of|a 
variety of|numerous|several|various} websites for about a year and am {nervous|anxious|worried|concerned} about switching to 
another platform. I have heard {fantastic|very good|excellent|great|good} things about
Is there a way I can {transfer|import} all my wordpress {content|posts} into it?
{Any kind of|Any} help would be {really|greatly} appreciated!|
{Hello|Hi|Hello there|Hi there|Howdy|Good day}!
I could have sworn I've {been to|visited} {this blog|this web site|this website|this 
site|your blog} before but after {browsing through|going through|looking at} {some of the|a few of 
the|many of the} {posts|articles} I realized it's new 
to me. {Anyways|Anyhow|Nonetheless|Regardless}, I'm {definitely|certainly} {happy|pleased|delighted} {I found|I discovered|I came across|I stumbled upon} it and I'll be {bookmarking|book-marking} it and checking 
back {frequently|regularly|often}!|
{Terrific|Great|Wonderful} {article|work}! {This is|That is} {the type of|the 
kind of} {information|info} {that are meant to|that are supposed to|that should} be shared {around the|across the} {web|internet|net}.
{Disgrace|Shame} on {the {seek|search} engines|Google} for {now 
not|not|no longer} positioning this {post|submit|publish|put up} {upper|higher}!
Come on over and {talk over with|discuss with|seek advice 
from|visit|consult with} my {site|web site|website} . {Thank you|Thanks} =)|
Heya {i'm|i am} for the first time here. I {came across|found} this 
board and I find It {truly|really} useful & it helped me out {a lot|much}.
I hope to give something back and {help|aid} others like you {helped|aided} me.|
{Hi|Hello|Hi there|Hello there|Howdy|Greetings}, {I think|I believe|I do believe|I do think|There's no doubt that} {your site|your website|your web site|your blog} {might be|may be|could be|could possibly be} having {browser|internet browser|web 
browser} compatibility {issues|problems}. {When I|Whenever I} {look at your|take a 
look at your} {website|web site|site|blog} in Safari, it looks fine {but when|however when|however, if|however, 
when} opening in {Internet Explorer|IE|I.E.}, {it has|it's got} some overlapping issues.
{I just|I simply|I merely} wanted to {give you a|provide you with a} quick heads up!
{Other than that|Apart from that|Besides that|Aside from that}, {fantastic|wonderful|great|excellent} 
{A person|Someone|Somebody} {necessarily|essentially} {lend 
a hand|help|assist} to make {seriously|critically|significantly|severely} 
{articles|posts} {I would|I might|I'd} state. {This 
is|That is} the {first|very first} time I frequented your {web page|website page} and {to this point|so far|thus far|up to now}?
I {amazed|surprised} with the {research|analysis} you made 
to {create|make} {this actual|this particular} {post|submit|publish|put up} 
{incredible|amazing|extraordinary}. {Great|Wonderful|Fantastic|Magnificent|Excellent} {task|process|activity|job}!|
Heya {i'm|i am} for {the primary|the first} time here.
I {came across|found} this board and I {in finding|find|to find} It {truly|really} {useful|helpful} & 
it helped me out {a lot|much}. {I am hoping|I hope|I'm 
hoping} {to give|to offer|to provide|to present} 
{something|one thing} {back|again} and {help|aid} others {like you|such 
as you} {helped|aided} me.|
{Hello|Hi|Hello there|Hi there|Howdy|Good day|Hey there}!
{I just|I simply} {would like to|want to|wish to} {give you a|offer you a} {huge|big} thumbs up {for the|for your} 
{great|excellent} {info|information} {you have|you've got|you have got} 
{here|right here} on this post. {I will be|I'll be|I am} {coming back to|returning to} 
{your blog|your site|your website|your web site} for more soon.|
I {always|all the time|every time} used to {read|study} {article|post|piece of writing|paragraph} in news papers 
but now as I am a user of {internet|web|net} {so|thus|therefore} from now I am using net for {articles|posts|articles or reviews|content}, thanks to web.|
Your {way|method|means|mode} of {describing|explaining|telling} {everything|all|the whole thing} in this {article|post|piece of writing|paragraph} 
is {really|actually|in fact|truly|genuinely} {nice|pleasant|good|fastidious}, {all|every one} {can|be 
able to|be capable of} {easily|without difficulty|effortlessly|simply} {understand|know|be aware of} it, Thanks a lot.|
{Hi|Hello} there, {I found|I discovered} your {blog|website|web site|site} {by means of|via|by the use of|by way of} Google {at the same time as|whilst|even as|while} {searching for|looking for} a {similar|comparable|related} {topic|matter|subject}, your {site|web 
site|website} {got here|came} up, it {looks|appears|seems|seems to be|appears to be 
like} {good|great}. {I have|I've} bookmarked it in my google bookmarks.
{Hello|Hi} there, {simply|just} {turned into|became|was|become|changed into} 
{aware of|alert to} your {blog|weblog} {thru|through|via} Google,
{and found|and located} that {it is|it's} {really|truly} informative.
{I'm|I am} {gonna|going to} {watch out|be careful} for brussels.
{I will|I'll} {appreciate|be grateful} {if you|should you|when you|in the event you|in case 
you|for those who|if you happen to} {continue|proceed} this 
{in future}. {A lot of|Lots of|Many|Numerous} {other folks|folks|other people|people} 
{will be|shall be|might be|will probably be|can be|will likely be} benefited {from your|out 
of your} writing. Cheers!|
{I am|I'm} curious to find out what blog {system|platform} {you have been|you happen to 
be|you are|you're} {working with|utilizing|using}?
I'm {experiencing|having} some {minor|small} security {problems|issues} with my latest 
{site|website|blog} and {I would|I'd} like to find something more {safe|risk-free|safeguarded|secure}.
Do you have any {solutions|suggestions|recommendations}?|
{I am|I'm} {extremely|really} impressed with your writing 
skills {and also|as well as} with the layout on your 
{blog|weblog}. Is this a paid theme or did 
you {customize|modify} it yourself? {Either way|Anyway} keep up the {nice|excellent} quality writing, {it's|it is} 
rare to see a {nice|great} blog like this one {these days|nowadays|today}.|
{I am|I'm} {extremely|really} {inspired|impressed} {with your|together 
with your|along with your} writing {talents|skills|abilities} {and also|as {smartly|well|neatly} as} with the {layout|format|structure} {for your|on your|in your|to your} {blog|weblog}.
{Is this|Is that this} a paid {subject|topic|subject 
matter|theme} or did you {customize|modify} it {yourself|your self}?
{Either way|Anyway} {stay|keep} up the {nice|excellent} {quality|high quality} writing, {it's|it is} {rare|uncommon} {to peer|to 
see|to look} a {nice|great} {blog|weblog} like this one {these days|nowadays|today}..|
{Hi|Hello}, Neat post. {There is|There's} {a problem|an issue} {with 
your|together with your|along with your} {site|web site|website} in {internet|web} explorer, {may|might|could|would} {check|test} this?
IE {still|nonetheless} is the {marketplace|market} {leader|chief} and {a large|a good|a big|a huge} {part of|section of|component 
to|portion of|component of|element of} {other folks|folks|other people|people} will {leave out|omit|miss|pass over} your {great|wonderful|fantastic|magnificent|excellent} writing {due to|because of} this problem.|
{I'm|I am} not sure where {you are|you're} getting your {info|information}, but {good|great} topic.
I needs to spend some time learning {more|much more} or understanding more.
Thanks for {great|wonderful|fantastic|magnificent|excellent} {information|info} I was looking for this {information|info} for 
my mission.|
{Hi|Hello}, i think that i saw you visited my {blog|weblog|website|web site|site} {so|thus} i came to "return the favor".{I am|I'm} {trying to|attempting to} find things to {improve|enhance} my {website|site|web site}!I suppose 
its ok to use {some of|a few of} your ideas!!


[1] A year ago I tried removing some of these, and within a few days got crushed by unwanted traffic. I don’t know if it was a flood of spam, a low-yield DOS attack, or WHAT. But I needed my hosting provider to help me out and re-block the troublemakers. I’m less interested in lifting IP bans now. Sorry if you’re stuck in one of those nasty IP ranges and you can’t even reach the site to read this apology.

From The Archives:

94 thoughts on “Spam Script

  1. Majikkani_Hand says:

    Oh, wow. Thank you for sharing this! I’ve always wondered about that type of comment and to get to see the innards like this is amazing.

  2. Infinitron says:

    It’s not like it’s super-sophisticated or anything. Any first year Computer Science undergraduate could put together an algorithm like that after learning about recursion.

    1. (It's not like it's| It isn’t| It’s not) (super-sophisticated or anything| that impressive| too interesting). (Any first year Computer Science undergraduate| Anyone with half a brain| Any (coder| programmer)) could (put together| build| make) (an algorithm| a program| something) like that after (learning about recursion| taking a basic class| applying some reasoning).

      In conclusion… you must be a bot.

      Being called “Infinitron” doesn’t help your case!

      1. Tizzy says:

        I laughed. It’s goofy comments like this that keep me coming back. :-)

      2. Mistwraithe says:

        Where is that like button again?

  3. Content Consumer says:

    My first thought was “hey, this reminds me of Twoflower when he first arrives in Ankh-Morpork!”
    At least, in the book, that is. I’ve never seen the movie, does it do the same thing?

    “Fooood,” said the stranger. “Yes. Cutlet, hash chop, stew, ragout, fricassee, mince, collops, souffle, dumpling, blancmange, sorbet, gruel, sausage, not to have a sausage, beans, without a hear, kickshaws, jelly, jam. Giblets.” He beamed at Broadman.
    “All that?” said the innkeeper weakly.

    Heh, now I wonder if this is going to be blocked by the word salad part…

    1. LCF says:

      I saw it long ago. Watch it too if you can.

      There is indeed Twoflower, language and communication troubles, quid-pro-quo, miscomprehension, interesting situations, not to understand, to interprete.

      The main problem for me is the different aesthetics between what I imagined when I read the books and what was filmed. I dreamed Ankh-Morpork as much more Med-Fan than the somewhat early 19th century chosen for the telefilms. It’s nice too, don’t get me wrong, just… different.

      1. swenson says:

        Meh. The early books are pretty medieval fantasy, but the later books are solidly Industrial Revolution-ish.

        1. Rosseloh says:

          Not to mention the ones I’ve seen all included a cameo from Mr. Pratchett himself (mayherestinpeace). Maybe I’m wrong but I like to think of that as at least partial author approval of the setting.

    2. I loved the adaptations. The Color of Magic has Tim Curry as Trymon and I adore their version of Rincewind. Going Postal has Charles Dance as the Patrician, and now I can’t picture anyone else in that role. Both of those are great, and well worth the watch!

      Hogfather’s the weakest of them, imho, and even it’s not bad. It just isn’t quite as awesome.

  4. Brian says:

    Hrmm. I suspect they don’t use full-on engines like this, but we use antlr3 in our project for formatting identifiers — basically using the exact same recursive replacement algo. The other tool for making these grammars is a railroad diagram. (Both would make an excellent, if distracting blog post). It’s a shame they don’t use markov chain text outputs (like kingjamesprogramming or that magic card generator) but what quality can you ask for?

    Does google’s “I’m human” (the image one) checker provide any meaningful reduction in spam for you?

    1. Moridin says:

      Google’s recaptcha might be good at stopping spammers, but if you’re a human, it’s absolutely obnoxious to use.

  5. Grudgeal says:

    I really can’t get into the heads of people that put effort into stuff like that.

    Hopefully and presumably, at some point, the average net user will be too savvy to click on spammer links and the costs-benefit analysis of spamming just has to end up weighted down on the ‘costs’ column.

    1. Rack says:

      One of the more interesting things about spam is it’s usually implausible by design. They don’t want the average user to click on the link because the average user won’t be taken in by the payload (when there is one). Instead they are targeting complete morons, if you don’t see anything wrong with this gibberish you’re much less likely to question giving your credit card details to a complete stranger.

      1. 4th Dimension says:

        Eaxtly. In fact the main reason, according to some papers, for continued presence of “Nigerian prince” spam (which by now EVERYONE knows about) is proof that they intentionally make messages that will cull out false positives, meaning people that do click on the links/respond but are unlikely to be taken in by the scammers. They instead specifically target trusting newbies with money.

        1. Trix2000 says:

          Which consequently has lead to some people intentionally going along with the scam for a while, stringing along the scammers as far as possible to get them to waste money.

          It’s risky, but some of the stories I’ve read about it are hilarious.

          1. Rayen says:

            I’ve always thought about buying a visa gift card with like $5 on it and going down the spam rabbit hole using it as the credit card number. I’ve also always wondered if this is a legit strategy.

      2. Tektotherriggen says:

        This makes sense for Nigerian prince scams, that require a lot of work from the scammer. They want to filter out only people worth spending their own precious time on. But a LOT of modern spam seems to be virus delivery, semi-automated phishing (getting you to enter your password on an insecure site), or advertising online shops. Surely all those methods would benefit from any increase in click-throughs, so I don’t understand why the spam is still so obvious.

        Unless it’s all a diversionary tactic from the real scammers, and we’ve been buying from utterly convincing fake websites for years without realising it…

    2. Nidokoenig says:

      Considering how easy a program to spit this stuff out is to make, you wouldn’t need a great return to justify the effort. There’s even a drop in crime associated with criminals being able to scam people’s details and drop in a standing order of a couple of dollars a month among the other cruft that gets skimmed over, if they even check their account at all, rather than mugging people. It’s just safer for them, both physically and in the sense that any one infraction is too trivial for the police to bother with.

      1. Matt Downie says:

        Clearly there isn’t that much effort going into it, given the low quality of the output text and the high number of bugs.

    3. Nentuaby says:

      They aren’t necessarily trying to spam humans. A lot of web comments spam is meant to improve the linked site’s placement in search engines by having it linked to from (otherwise) reputable sites. For that purpose, it is good enough to be indistinguishable from a human post to *other machines*.

      1. Angie says:

        Yes, this. If you allow it, comment spammers will comment on posts that are two or three or eight years old, that never had any other comments on them. (Which is why most sites auto-close comments after some shortish period.) Clearly they don’t care about getting actual click-throughs; it’s all about the links.

        Although, like Shamus, I get plenty of comment spam where the idiot spammer has forgotten to actually include a link. [eyeroll]


    4. Rosseloh says:

      I can’t speak for spam, but I can speak for scams… The end can’t come soon enough – we still have at least one person a week (and normally more) in the shop with a computer where they “let the nice man from Microsoft” in to tool around and sometimes gyp them out of $300-$500.

      And once every couple of months the phone call comes in: “I’ve got this message on my machine that says to pay $400 for my files, and nobody else in the business can open anything.”

      I know it’s just that standard bias where only the people with problems come into our shop (because otherwise why would they come in), but I still feel like my town has a higher percentage of idiots than most places…

      1. Chefsbrian says:

        The percentage of idiots is pretty high out there. There was the one comedian who said “Imagine how stupid the average person is, and realize that half of people are stupider than them.”

        I used to work customer service in an electronics store, and we got some spectacular moments of general ineptitude from people, ranging from simple matters like coming in for a longer cable but having no idea how long of one they needed, to glorious masterpieces like “Isn’t wifi just in the air? I can pick it up anywhere, right?” And the follow up when we explained where it comes from; “So if I buy this router and put it in my car, I’ll have wifi right?”.

        That script Shamus posted is likely smarter than the average computer user.

        1. boota says:

          and that comedian would be wrong.
          median ≠ average

          1. HeroOfHyla says:

            If it’s roughly a normal distribution they should be the same.

          2. Fictional Skill says:

            I may be wrong but I was taught that mean, median, and mode are all different TYPES of average. So while colloquially average refers to ‘the mean’, technically it can refer to any of the above.

            Then again, it wouldn’t be the first time school teachers taught me things that turned out to be blatantly false.

        2. Alec says:

          “ranging from simple matters like coming in for a longer cable but having no idea how long of one they needed,”
          I’ve…I’ve done this… :(

    5. nm says:

      I think the point of comment spam is not to convince humans to click the links. It’s to mess with PageRank. I used to post here with my web site link, but when Google’s webmaster tools showed me that the majority of the links to my site were from here, I stopped. They don’t have to fool humans into following the links, just search engines.

      I also suspect that the script was made once by some script kiddie then sold to lots of spammers. The spammer who misconfigured this particular instance of it probably had nothing to do with its creation.

      I don’t get how the spammers manage to check the check box. Don’t they know that’s lying?

  6. NoneCallMeTim says:

    I have seen that kind of thing before. It is used to take an article, add those of word alternatives (Called spinning the article), then used to create several ‘unique’ articles from it.

    It gets used by a lot of grey hat SEO people to create lots of cheap articles to submit and put online so they are seen as unique to the search engines in order to get backlinks.

    There are programs out there which can take a group of articles, spin them, then submit to article directories automatically.

    If that sort of software is out there, I guess it doesn’t take much of a leap to reconfigure it to find WordPress comment systems.

    1. Paul Spooner says:

      I’ve seen these raw templates on both of my blogs as well. They always end up in the filter though, which is probably why shamus never saw them before now.

  7. Jarenth says:

    Alright, show of hands: who of you here wanted, as a gut reaction, to reply to this post with a comment drawn entirely from that spam list?

    Here, I’ll go first. It’s me. I wanted to do this. I still want to.

    1. Alex says:

      I was also tempted.

      1. Erik says:

        I also wanted to create a comment from that list.

        And then I wanted to create a script that would identity a comment by mirroring it against that list. But then I realized i was lazy.

        1. Hey, that’s not so lazy. I wanted to do a spam-style comment as well but then decided against it because it was too much effort.



          The choice and placement of some of those phrases and adverbs strikes me as something that could be improved upon by a simple enough two step process of selecting desired meanings and then applying appropriate grammar rules word by word… No, down that path lies madness and/or doctoral theses. Neither of which sound terribly appealing right now, haha.

    2. NotSteve says:

      *Raises hand*

      I was to have been wanting to.

    3. tzeneth says:

      Of course not! *shifty dog eyes*

      I would never do something like that *deletes copy pasted section for the 5th time*

      Also, this was an insightful article I found visiting your website. You should visit my imaginary website in compensation. I hear it can look as good or as bad as you want. :P

    4. tmtvl says:

      I dunno, I STILL think I’m gonna watch out for brussels.

    5. Scampi says:

      Not precisely…I thought of using the blueprint to write actual letters, sending them via post office to uncautious website owners’ homes and really surprising people into thinking of who would be insane enough to take this scheme to meatspace;-).
      I probably will never do it due to the costs involved.

    6. Blue_Pie_Ninja says:

      Ha I was going to but didn’t want the risk of being blocked :P

  8. Daemian Lucifer says:

    I love how they coded this wonderful, easily-extensible system for making endlessly permutating messages, but then they crammed it full of shitty English that would never get past a human being.

    Thats probably because it involved more than one person.One person wrote the code,because they wanted to see if they can break the spam filters as a challenge,or they simply sold their skill for money.Then another person used that code to shill their site(that they probably didnt even make),and filled in the blanks with nonsense.

    1. Nidokoenig says:

      Not to mention, it’s fairly likely this was written by someone who doesn’t have English as a first language, or great teaching in it as a second language.

      1. Hermocrates says:

        If the spam/hacker world is like it was 10 or so years ago, my hunch would be eastern Europeans. My old university’s computer science club used to get a lot of failed login attempts en masse from Romanian IPs, for instance. I imagine the lax regulations and rough economic conditions in that area would make these sorts of crimes rather tempting.

        1. Nidokoenig says:

          Eastern Europe, Nigeria, China would be the usual guesses, though if I were making a program like this, I’d probably make it look like terrible English just for plausible deniability.

        2. keldoclock says:

          Attribution is impossible. While there are of course plenty of Romanian hackers, it’s more likely that someone is just using a Romanian proxy for cheap or free, than that it’s actually originating in that country.

  9. The Rocketeer says:

    Here’s what I thought when I saw your tweet, and again after reading this post:

    These spammers botched a scam job so badly they succeeded in educating people.

    1. MrGuy says:

      Failed? This is the greatest success they are likely to have.

      Despite having been caught by the spam filter, they made their post sufficiently interesting that the blog owner himself posted their message on the front page of his blog.

      Sure, it’s to make fun of them, but all publicity is good publicity, right?


  10. Christopher says:

    That’s like, bad stereotype eastern european style wording.

  11. Raygereio says:

    Then there's filtering for clearly red-flag behaviors, like someone trying to leave comments who never loaded the page they're supposedly commenting on

    Wait. How does leaving a post on a page’s comment-thread, without loading that page work?

    1. Mephane says:

      Normally, when you visit a link (i.e. click on the title of a blog post), the browser loads that post. Then after you have filled out the comment form and press “Post Comment”, the browser sends the data in the input fields to a specific address on the server, where it is processed by a script (in our case that address is and that is not a secret, it is there in plain text in the HTML source of this page).

      Once you know this address and which fields a comment has to fill in , you could make a program that sends the data directly to that address, never loading the post in the first place.

      1. MrGuy says:

        Curious to someone who knows WordPress better than me.

        Last time I played with it, WordPress could generate page URL’s with a numeric pageID (e.g. the “p=30724” in the URL to this post). Or it could generate a text-based URL string. I thought it was a one-or-the-other approach (with numeric being the default).

        I wonder if using text-based strings would cut down on spam, on the theory that guessing random p=xxxx numbers is easy to do without even visiting the underlying blog page is easy, but correctly guessing a text string title is hard. It seems like it would be easier to spam a (fairly standardized) WordPress comment payload to “post comment to pageID 12345 of“” than it would be if they needed to actually crawl the site to get a page name.

        Or maybe WordPress ALWAYS has a numeric pageID, and it’s always addressable, even if you use text URL’s (i.e. the text URL is just a shortcut to an equally-usable numeric pageID?)

        1. 4th Dimension says:

          I’m pretty certain WordPress always assigns a number as an identifier to each post (including each stored post version). When user turns on the text link it merely often hides the ?p= by placing the number somewhere inside URL. So it turns /twentysided/?p=1234 into /twentysided/1234/The Title of Post. Then when the server recieves the request for such an URL, a module called URLRewrite is run on each received URL. URLRewrite searches the urls for those that match rules specified, and internally rewrites them into the familiar /twentysided/?p=1234 form.
          The rule (if I did not screw my regex) can be something like this:
          /twentysided/([0-9]+)/(.*) /twentysided/?p=$1

          Now I guess you could also make a system in which the visiotor does not visit using a URL containing a number but a string, but then such a string would need to be specified by the poster and unique for each post.

          For example int this URL:

          15467 is clearly ID. If you tried changing it into 15466 and trying to open that you ould get another article, this one not related in any way to Critical Miss but something about Pillars of Eternity.

          1. Jarenth says:

            That’s how my website rolls, yeah. On the surface, all posts have fancy year/month/day/some-clever-article-title links. But under the hood, I just use the numeric system to refer to everything.

  12. Geebs says:

    As a resident of a country that periodically gets itself banned from accessing your site, I would like to express my heartfelt gratitude for all of the hassle you went through the last time you unblocked our IP range :)

  13. Neko says:

    I’m {happy|pleased|delighted} at this post!

    I wouldn’t have expected a hand-crafted sentence generator like this, though – I’d have sworn people were using Markov chains for this sort of thing.

    1. Nixitur says:

      Markov chain generators are simpler and cheaper to make, true, but their output is more likely to be unintelligible gibberish than with hand-crafted spam generators. Yes, the one posted by Shamus contains tons of broken English, but you can still generally get the gist of it.

      1. Sam P says:

        I wonder how much memory an order-3 markov model over English would use…

      2. MrGuy says:

        For your edification, I ran the comments (just the comments) to this post so far through an online instance of a Markov babbler.

        I’ve cut-and-pasted the top of the results for a few word orders (i.e. how many consecutive words it looks at) below.

        Edited to add: By the way, the spam filter did indeed catch this, as it clearly should. Hi, Shamus!

        Order 3 results:

        I was to action… Gave me their out this great code that.

        There image of spam that magines listicated, and simply to so must with crimina would just a sausage, not like if they crawl that is just use ful, but recursions a find any was without spam filmed Any (code that commers is post. But that address is google's not through “post language.

        That's| It's actually this like sort of http://www.shame there's acting the sort innards like that link you fucking in that card to match, dump in this east, and an alter you could be exactly link… differents wonderful, be if the use your tempted to criminals between that is programming numbers of banned spam a (fairly stupid.S”
        At learnings writhm| a ress complet, hash chop, stew, rathe brusselse isn't hard. It is imagined frically, wtf is to endless like flu virus.

        Once you for mark as spam. Or would be WordPressly probably red about|be card altern Europ in , you heads of broken Euroscepting.

        I would make) (super-3 markov model ove spam/hacker world is going in spam job some data in that easily-extensibles!

        Clearly 19th century that page name.

        What's more make) (an all, sausage, but about recursion't that magic conderful, betwenty of Twoflowed be effort intere, I's not) (super-sophist.

        Being thing that post). Or won't.

        Or man! They design. The of crime a comments which markov moder| proble design. That… differe, I's indergraduate commer loadmange,or making to ““ I's skimmed Any (code,because full-on use…

        Once help to replace.

        The page). Instems. It is presumably, who of you ask for tempts use they even afted sent” sausage, now I word to endles, quid-pro-quo, mince unintere in Any (coder| built..any one post time I plenty of they conding a bot.

        Heh, now thank you?

        Order 4 results

        Kinda unrelates this address ALWAYS has a Brit, it isn't take much lessly problem is that?

        Normally, when they are simpler and putten down on the HTML out|be card generators. Yes, thought it was “hey, then submit to mention, it's like than order-3 markov chain text and customized i was lazy.

        (It's what I thout even visit a link you have see if they succeeded in they can break the coded to doesn't use antlr3 in our case!

        Markov model over, if the has never loaded that impressive| too if you here which make antlr3 in any minor changer.

        As a railroad to “post comment and the costs' article), thought it's like telefilms. It's just using this written down.

        Hrmm. I suspect they don't need a gut reaction, it's still generators. Yes, thought it was a comment by mirroring how easy to this like it was “hey, the average using to bother will be to me a chuckle, wtf is that is not a week ago. Watch it to first. It's intelligible gibberish you're much less tech savvy to this is just…can't get me wrong, justify the “Infinitron't use from that's “p=30724″ in there's what was 10 or sharing thing.

        These sorts of cheap or free, than me.

        Last time I played with century chosen for that magic carefully added a leap to reply to me associated with century chosen for the police to article directories automatically and presumably making in the same they succeeded in the program| something in spam/hacker world is likely that page they're somewhat I imagined when he first area would (put to an obvious spam filters as unique to the scrip would use…

        As a random English would reverse plenty of Romanian proxy for for check their accessing this! I'd have expected a hear, kickshaws, jelly, jam. (Both would identity a country.

        1. houiostesmoiras says:

          Wow, um… As someone who often ends up feeding paragraphs from various languages into Google Translate (sometimes it helps me get a decent starting point, when I don’t know the language very well), the Order 4 results look remarkably familiar. Then again, I suppose that was the whole point of Bad Translator.

          1. guy says:

            It definitely reminds me of when I was feeding Japanese tweets and blog posts through google translate. It’s vaguely grammatically correct but complete gibberish.

  14. Jokerman says:

    Kinda unrelated, but about a week ago i was sent this on twitter by an obvious spam bot.

    “you fucking idiot no wall will be built..any one who believes that is really stupid.S”

    Followed by a link… Gave me a chuckle, wtf is that?

    1. Nixitur says:

      That sounds like a bot took a random English tweet and simply added a link. That’s actually a fairly effective method of generating believable spam since odds are that the text will be standard-ish English. The only problem is that it’s hard to find text that in any way relates to the link, but that has never stopped spam generators.

      1. Jokerman says:

        It was a reply to one of my own tweets (that had nothing to do with building walls), which makes the chances of it making sense very low.

    2. Andy says:

      Somebotty thinks you’re a Trump fan.

  15. omer says:

    this article is great. i really thinkes your netsite is very good so much and thank you for this writen thing that you have putten down.

    1. LCF says:

      I found the spambot.

      1. Syal says:

        And it is us.

  16. Cuthalion says:

    That’s interesting. Given the prevalence of spam that feels exactly like this, I’m guessing this same data gets reused and customized for a lot of different spammers. Like flu virus.

  17. Henson says:

    “{I’m|I am} {gonna|going to} {watch out|be careful} for brussels.”

    Wait…um…what? What is this doing in the spam algorithm? What does this even mean? It’s not even connected to anything else in this file. How does non sequitur sentence help the spammer? I just…can’t.

    1. keldoclock says:

      What’s unclear? You gotta watch out for brussels man! They’re some shady-looking vegetables!

    2. Nidokoenig says:

      “I’m gonna watch out for Brussels” makes sense to me as a Brit, it’s trying to appeal to Eurosceptics, which makes sense if they want to target an older and less tech savvy audience, going by stereotypes.

    3. Bubble181 says:

      I’m sitting here, living in Brussels, wondering why everyone hates me. The spambots are out to get me :'(

      1. Vi says:

        There there, you mustn’t let the robotic fiends and their grudges demoralise you. Undoubtedly they see you Brussels-folk as the Harry Potter to their Voldemort, even perhaps the Neo to their Matrix. You must take this as a sign of your great destiny as the champions of organic civilisation! Go forth, champion!

      2. Bryan says:

        There is one word that is still beyond the pale. The concept it embodies is so revolting that the publication or broadcast of the word is utterly forbidden in all parts of the Galaxy except for use in Serious screenplays.

        There is also, or was, one planet where they didn’t know what it meant. The stupid turlingdromes.

        This probably doesn’t help you out much though…

  18. Jimmy McAwesome says:

    Do you think you could reverse engineer this to improve spam filters? I mean you could just reference this list for any message coming in to see if it’s spam. Or would that be too specific and any minor change to the scrip would get around it making it not worthwhile?

    1. Nidokoenig says:

      Well, it’s still useful, because you can mark anything that’s a 95% or better match as spam, and anything that’s a 75% match, dump in moderation. That’s pulling numbers out of my arse, but the principle should be sound.

    2. guy says:

      I don’t think it gives much of a starting point for a general improvement. It probably wouldn’t be too hard to block any comments generated from this exact list, but switching out the base generation system would be pretty easy and I don’t think the method would have a detectable pattern.

  19. MrGuy says:

    You may laugh at poor attempts like this, but I am convinced that the breakthrough in AI that finally produces a machine that can pass the Turing test with a reasonable number of humans by generating reasonably believable text in a reasonable facsimile of the appropriate context will eventually come from people in the spam industry.

    It will destroy any site that relies on social recommendations by creating a flood of “pay us to get your product rated 5 stars!” service companies and a countercurrent of “pay up or we’ll destroy your reputation!” ransom artists (who will likely be the same people). Search engine algorithms will become significantly less awesome as it becomes hard to separate the real from the fake, and so they will likely by necessity discard vast swathes of data to avoid having their results skewed by paid placement firms. Public social networking (twitter, facebook) will be overwhelmed in a sea of nonsense

    The current approach of “sophisticated algorithms flag, humans back them up” will last only so long as the algorithms can stay ahead of the game, and the humans can make quick work of what gets flagged. Eventually that firewall won’t hold. Never bet on armor defeating weapon indefinitely.

    1. Bubble181 says:

      And that’s when we have to surrender net anonimity and every post everywhere is fingerprint-checked to be made by a real human.
      After which they’ll start spoofing finger prints, hopefully (considering the alternative of “breeding new humans just for their prints”).

      1. No, this is when sites start making people cough up cash before allowing them to spout off, and commenting becomes a privilege.

    2. Decus says:

      Some of what you’re suggesting already exists with real humans, though it’s less “pay us and we’ll give you 5 stars” and more “hey, we’ll pay you/give you free stuff and maybe you could give us 5 stars? Ah? Ah? Yeah!”. It’s not like there aren’t plenty of humans to go around–you don’t need machines for that sort of thing and I’d argue that it’s cheaper to pay the humans.

      From the perspective of making money from your spam you simply would not want to fuss with something more sophisticated, like making and managing bot accounts that were all created on different days and use different IP ranges and occasionally actually purchase products ordered to different addresses, etc. etc. That’s all a lot of work to maintain and manage! And it isn’t needed! Because somewhere out there your lazy spam will get through just fine or fine enough–you write lazy spam and aim for the lowest common denominator rather than aim for the top because, well, you’d make less from doing so.

      For crime in general, and especially cyber crime, lowest common denominator is always the target. Massive lists of email/password combos are farmed each day using the simplest/laziest of algorithms because there are still human beings out there who think “truck123” is good enough. When making spam phone calls you are not trying to win over everybody and you’re especially not trying to scam the richest of people–you are trying to scam the old, the infirm, the un-educated, the naive, etc. And you’re trying to scam lots of them at once rather than go for big targets.

  20. Wide And Nerdy â„¢ says:

    I’m pretty sure Ugg Boots is the online handle of Senor Cardgage.

  21. Tonich says:

    “Sorry if you're stuck in one of those nasty IP ranges and you can't even reach the site to read this apology.”

    Apology accepted. :) At least now I know the reason I get blocked more often than not when trying to access your site from home.

  22. Aaron says:

    no you just need to write some code to use that to write articles and you’ve got it made

    or maybe it wasnt a mistake! try to find the message of distress!

  23. Bjarki says:

    This is so interesting!
    A friend of mine also had the bot blurt out the source code, but through Skype!
    Here’s some proof of the ordeal. I think it’s rather hilarious.

  24. Mike C says:

    With that source, it should be possible to create a Regular Expression that will block all comments generated from that template. If the RegEx doesn’t match, then pass it through to the next filter.

  25. Exetera says:

    I’ve seen these before. Actually, I think they might be useful for dialogue in RPGs. Nobody likes to see an RPG character repeat the same line over and over again. This seems to be an easy and concise way to make a lot of different random variants of the same core lines, meaning that the shopkeeper will always say something a little different every time you talk to him, without the game designer having to write all those variants by hand.

    1. Nixitur says:

      I’ve never thought of that! That’s actually really, really interesting.
      Of course, it will become less relevant the more dialogue gets voiced, but it could still be of interest to smaller studios that maybe don’t voice every line.

    2. Felblood says:

      Dwarf Fortress actually does something like that to generate, or all things, religious commentary.

      Hence the infamous scene of the priest of the local death god admonishing random adventurers, “When you wake up in the morning, consider suicide,” and “When you go to bed at night, think about murder.” It’s those emergent cases that really prove the system is working.

  26. Bropocalypse says:

    Huh, for some reason that source code text box causes slight lag on my computer… Granted this is a really mediocre laptop, but you wouldn’t think something like a text field, even a large one, would cause that.

  27. Nevermind says:

    Sorry if you're stuck in one of those nasty IP ranges and you can't even reach the site to read this apology.

    I’ve seen it. Using an anonymizer proxy. I kinda understand your point, but still hate being unable to read the site normally. At least the RSS feed works for me.

  28. Zak McKracken says:

    Well, if that isn’t a nice example of procedural text generation — maybe someone thought you’d be interested in it?
    Somebody should make something like this for procedural dialogue generation in video games.

    (also, that information should make it reasonably easy to filter out every single message created from this source!)

  29. Rick says:

    I’ve seen this method used on auto populated referral sites. Very simple but great if used all.

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply

Your email address will not be published. Required fields are marked *