Automated comment spam

Bloke · 2010-09-07 09:03:37

Got hit today for what I believe is the first time by automated comment spam. Most of the time I get the odd manually-submitted comment which is easy to deal with and a lame attempt at best by some company with too much time on their hands to pay someone to do it.

Today was the first concerted effort by a script. How do I know? A few telltale signs:

1) The comments were exactly 16 minutes apart, like clockwork (although some seem to have missed or been rejected and therefore occasionally they’re posted 32 minutes apart). If it’s a human it’s The Terminator and I’ve then got more immediate problems; like answering the door and saying Sarah Connor lives in the next town

2) My Visitor Logs have pairs of POST attempts at the same blog article every 16 minutes. These are recorded first as a 200 status and at exactly the same time (or 1 second later depending on granularity) a 302 status. Would love to see what the contents of these POST attempts were, but I’m not sure where to find this out. Will trawl the server logs to see if there’s any residue

The e-mail address is always a fake gmail.com or yahoo.com address like asdfwjy12 or something, so the domain checks out. I let the spammer post 42 messages so I had plenty of entropy to analyse and then cut them off (luckily they weren’t using an IP address randomizer or a proxy).

Anyone else had this kind of thing? Anyone have any further insight into how I can study this kind of attack and find out more about it? What sort of technique is a script using to bypass the forced preview? My guess is the script must be submitting the comment and then resubmitting like a normal user would, thereby passing the nonce check. This may explain the double-entries in the Visitor Logs (but not the 302 status of the 2nd attempt, hmmm).

If so, this is targeted specifically at TXP or other forced-comment-preview systems and we perhaps need to think about being more clever; possibly using time domain analysis or loosely coupled sets (which I think I’ve mentioned years ago on this forum somewhere) to filter out obvious rapid-fire comment attempts at frequent intervals. I’m sure if I study this that a comment spam plugin can be devised to thwart this type of attack and its variants.

Thoughts? Avenues of exploration?

Thanks in advance.

(EDIT: removed dumbass mention of ZCR)

Last edited by Bloke (2010-09-07 09:11:29)

Gocom · 2010-09-07 09:59:58

Bloke wrote:

This may explain the double-entries in the Visitor Logs (but not the 302 status of the 2nd attempt, hmmm).

Textpattern uses redirects in the commen saving process.

What does the spammer sent to the server? It might be a bot that actually uses full rendering engine. Depending on how well it’s coded, it might be almost unstoppable, like Terminator. Now, ‘hide ya kids, hide ya wife’.

The forced preview isn’t going to stop everything. Just the mindless bots that just send forms without thinking first.

Bloke · 2010-09-07 10:14:33

Gocom wrote:

Textpattern uses redirects in the commen saving process

D’oh! Thanks. That explains that bit then. I get so few comments (*sniff, blub* must write something more interesting) I never noticed that before. Plus the comments area of the core is still voodoo to me until I get my head round it. One day…

It might be a bot that actually uses full rendering engine.

Could be. I still haven’t found what was sent exactly (search ongoing). Here is one attempt from the server logs though:

79.142.65.175 - - [07/Sep/2010:08:21:01 +0100] "POST /blog/corporate-tautology HTTP/1.1" 200 32733 "http://stefdawson.com/blog/corporate-tautology" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
79.142.65.175 - - [07/Sep/2010:08:21:01 +0100] "POST /blog/corporate-tautology HTTP/1.1" 302 5 "http://stefdawson.com/blog/corporate-tautology#cpreview" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
79.142.65.175 - - [07/Sep/2010:08:21:02 +0100] "GET /blog/corporate-tautology?commented=1 HTTP/1.1" 200 30203 "http://stefdawson.com/blog/corporate-tautology#cpreview" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

Note: POST (200), POST (302), GET (200) within 1 second. I guess that mimics what I’d see as a standard request stream from a real user? i.e. POST the comment, POST preview status and then GET the commented article contents. Or is there something in that exchange that screams bot? Perhaps the fact it all occurs within 1 second gives it away? Unless someone is posting “lol” as a response (in which case, it’s not much of a comment anyway!) and can preview, scroll, submit in less than a second.

This looks like a failed one (403 status):

79.142.65.175 - - [07/Sep/2010:09:59:27 +0100] "POST /blog/corporate-tautology HTTP/1.1" 403 4571 "http://stefdawson.com/blog/corporate-tautology" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

The forced preview isn’t going to stop everything.

Indeed. That’s why I wondered if a plugin could focus on other things that make us human — such as the fact it takes time to write something and interact with a form? If someone is bypassing the form altogether and just submitting a POST request (twice: once for preview + once for submit) within 1 second, that ought to be detectable too.

Last edited by Bloke (2010-09-07 10:37:01)

Gocom · 2010-09-07 13:06:57

Bloke wrote:

My Visitor Logs have pairs of POST attempts at the same blog article every 16 minutes. […] plugin could focus

Rah_comment_spam has features that might help. For example it has the ability to limit posting frequency. It’s not really ideal to allow only 1 comment in 20 minutes per user, but it can block such comment spam. It also has filters which can be used to block email provides such as hotmail.com and yahoo.com.

Note: POST (200), POST (302), GET (200) within 1 second. I guess that mimics what I’d see as a standard request stream from a real user?

Yep. 1 second might even be more than what it takes to press the “Submit” button. The 1 second is from hitting “Preview” to “Submit”.

Unless someone is posting “lol” as a response (in which case, it’s not much of a comment anyway!) and can preview, scroll, submit in less than a second.

You say scroll? You don’t use the #cpreview anchor? How do your users even know about the preview w/o seeing the preview and the “yes, I would want to submit this” ;-)

Indeed. That’s why I wondered if a plugin could focus on other things that make us human — such as the fact it takes time to write something and interact with a form? If someone is bypassing the form altogether and just submitting a POST request (twice: once for preview + once for submit) within 1 second, that ought to be detectable too.

That might be a good idea. The problem is that the step between the writing and saving is the preview. And pressing the submit button takes just a second.

The time between the intial article request and submitting the comment could be more usable. The problem is that then every article request has to be logged, and if the article gets a lot of hits, it means that the database is filled with IPs.

I will look if I can add such feature to rah_comment_spam. I probably could use TXP’s own logging system… or maybe not because that would require enabling the logs… maybe I could use the nonce instead? It does include the creation date. But then again it’s linked to the preview and isn’t exactly user aware.

Hmm, by using JavaScript the time between comment field focus and preview could be counted and that would save database space… but that really isn’t option either. Maybe I just do own logging mechanism which logs every article hit or something. If the feature is even needed, hmm.

Last edited by Gocom (2010-09-07 13:09:12)

Bloke · 2010-09-07 13:24:57

Gocom wrote:

Rah_comment_spam has features that might help.

I’ll take a look, thanks. Forgot about that one (shame on me). Like you say, 1 comment in 20 mins per IP might not be a good enough measure. Even one message per blog post per 20 minutes might stop a comment exchange. Shame.

I’ve installed mem_akismet using AntiSpam.TypePad for now as a stop-gap, so it should at least filter out obvious messages by content.

You say scroll? You don’t use the #cpreview anchor?

Ah yeah, I do. Sometimes comments are quite large though and you need to scroll a bit below the comment form (which takes up a fair amount of space on my site too) to see the Submit button.

The time between the intial article request and submitting the comment could be more usable.

Yes, that’s the time I was thinking about. It takes time to write a post, time to preview and time to submit. If it was below a user-configurable number of seconds you could bounce the post. I used that quite successfully on my previous (non-TXP) site to filter out bots who tried to use the contact form.

As you say, you need to check the timestamp from the start of writing to the end of submitting. The start of writing is not necessarily the page delivery time because someone often takes time to read the post itself (unless they’re replying to a reply).

The JS route would probably work, but a bot would probably not trigger focus() so the start time would never be initialised. In that case I suppose the absence of any time info (or only seeing partial time info) submitted along with the form would be indication itself of a non-human.

It’d be cool if it could be done locally instead of filling up a database table. After all, you don’t really care about IP or any per-user stats in this case, it’s just a “humanity” check. Perhaps by stashing the form-filling start time in a hidden form field (suitably nonced to prevent injection), submitting it with the form and comparing it server side with “now” upon final submit? If the two times are less than the configured timeout period, bounce the message.

Gocom · 2010-09-07 14:50:30

Bloke wrote:

Perhaps by stashing the form-filling start time in a hidden form field (suitably nonced to prevent injection), submitting it with the form and comparing it server side with “now” upon final submit? If the two times are less than the configured timeout period, bounce the message.

Ah, that sounds so much better :-) Are you meaning that the comment form will include something like:

<input type="hidden" name="time" value="strtotime()" />
<input type="hidden" name="timenonce" value="md5($nonce . strtotime())" />
<input type="hidden" name="whatnonce" value="thatnonce" />

And then checking:

if(md5( $whatnonce . $time) == $timenonce && $time < now()-$interval)
	YouHuman;

Last edited by Gocom (2010-09-07 14:57:29)

Bloke · 2010-09-07 15:06:52

Gocom wrote:

Are you meaning that the comment form will include <snip>

Yep! If it’s doable from a plugin then I think that kind of logic should work quite well. Off the top of my head I can’t see any reason that approach won’t work.

Gocom · 2010-09-07 16:19:37

Bloke wrote:

Yep! If it’s doable from a plugin then I think that kind of logic should work quite well. Off the top of my head I can’t see any reason that approach won’t work.

Yes, should be.

Altho, the txp_discuss_nonce table will get slight nonce spam as this requires that the nonce is generated on initial page load, not on the preview step. Not really different from IP logging.

Otherwise getComment() would work but nonce isn’t available from the start. One of the alternatives is to generate a permanent nonce on plugin install which is then shared with users. Hmm.

Gocom · 2010-09-15 04:51:12

It indeed was doable. Now the new version of rah_comment_spam, 0.5, includes this sort of feature.

I ended up using single “hard-codish” secret key for every comment form call, because the nonce is needed everytime the article page is requested. Constant deleting and inserting doesn’t sound nice. The secret key is created when the feature is enabled in the plugin’s preferences. The secret can be refressed by disabling and re-enabling the feature.

Textpattern CMS

Textpattern CMS support forum

#1 2010-09-07 09:03:37

Automated comment spam

#2 2010-09-07 09:59:58

Re: Automated comment spam

#3 2010-09-07 10:14:33

Re: Automated comment spam

#4 2010-09-07 13:06:57

Re: Automated comment spam

#5 2010-09-07 13:24:57

Re: Automated comment spam

#6 2010-09-07 14:50:30

Re: Automated comment spam

#7 2010-09-07 15:06:52

Re: Automated comment spam

#8 2010-09-07 16:19:37

Re: Automated comment spam

#9 2010-09-15 04:51:12

Re: Automated comment spam

Board footer