Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#16 2005-01-22 00:59:20
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
> I saw this method for stopping referer spam a couple days ago; it’s not a bad idea but it’d be nice to have something which scales.
It’s fine if you like spending hours each day playing whack-a-mole.
Patience, fellas.
Alex
Offline
Re: Automatic Referral spam Blocking.
And if you want to try and counter-program referral spammers, know your enemy and their techniques.
“My guess is that spammers will start using HTTPS next. That way madness lies, since the SSL handshake alone will bring your average server to its knees if it starts happening in batches of 80 or so.”
Oh, geez…
TextPattern user since 04/04/04
Offline
Re: Automatic Referral spam Blocking.
I’ll have the first version available by Sunday night. This initial release will only have a keyword based filter editable through a web form. (txp?) and will not be for ‘ransom’
Following versions will support ‘bayesian’ filtering and a basic pattern detection (this itself, is proving to be the most difficult)
Online
Re: Automatic Referral spam Blocking.
Interesting discovery. I’ve been researching the tools that these idiots use, and not a single one of them “follows” any redirect as they operate on a basic http level and just issue GET requests to as many URLS as possible.
Following this, I’ve realized that it would be possible to write a simple script that upon loading, redirects the legit visitor to the page they intended to visit.
Benefits?
- Bandwidth.
- Ability to log who does /not/ redirect and ban the ip at the htaccess level.
I can see that using this would pretty much destroy the whole point of referrers, but they are pretty much useless now anyway.
Online
#20 2005-01-26 23:16:19
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
> TheEric wrote:
> Interesting discovery. I’ve been researching the tools that these idiots use, and not a single one of them “follows” any redirect as they operate on a basic http level and just issue GET requests to as many URLS as possible.
My research shows a very different result: most referrer spammers follow redirects.
Alex
Offline
Re: Automatic Referral spam Blocking.
>My research shows a very different result: most referrer spammers
>follow redirects.
Nope. I obtained a copy of one of the primary window applications used, “Reffy” and it simply issues the HTTP GET with the appropriate information, and then closes that connection. End result? Referrer spam and no redirect.
Now, the great thing about any client that MAY follow the redirect? redirect it back to itself.
Win Win.
Online
Re: Automatic Referral spam Blocking.
TheEric, two things to think about: If Zem is seeing referrers that follow redirects, you can’t dismiss that with a nope. Obviously there are people spamming that don’t use “Reffy”. Second, if you redirect clients that follow redirects, how do you propose to have legitimate clients get to your site?
Offline
#23 2005-01-27 00:25:30
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
> > most referrer spammers
>follow redirects.
> Nope.
213.172.36.62 – - [14/Dec/2004:23:39:25 -0500] “GET /article/1925 HTTP/1.0” 301 0 “http://www.texasproptax.com/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT5.2; .NET CLR 1.1.4322)”
213.172.36.62 – - [14/Dec/2004:23:39:31 -0500] “GET /article/1925/wired-fluffy-bunny-no-longer-energized HTTP/1.0” 200 12944 “http://www.texasproptax.com/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322)”
..etc. Almost all spam that’s hitting my honeypot displays similar capabilites (and then some).
They moved on from Reffy a while back. Whatever tool they’re using now is far more sophisticated. More details here, here, and here.
Make no mistake: these guys are clever, motivated and well funded. They won’t fall for a circular redirect, and they know exactly how to turn your keyword blocker into a quagmire.
Alex
Offline
#24 2005-01-30 22:39:20
- Kibitzer
- Member

- From: Melbourne, Australia
- Registered: 2004-05-24
- Posts: 44
Re: Automatic Referral spam Blocking.
I saw this article about blocking referral spam and thought it might be interesting. I think it’s what you guys are talking about; the article was helpful in explaining the problem to me. I know you’re looking for an automatic solution but perhaps this’ll be useful to others in the meantime.
VC200 Member #69 — VCTWO Member — Mixed Gorilla
“YES!” “That would be an ecumenical matter!”
Offline
#25 2005-01-31 00:10:52
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
Kibitzer,
Spammers already change their URLs and IPs frequently in order to turn manual blacklists into a game of whack-a-mole.
Alex
Offline
Re: Automatic Referral spam Blocking.
66.55.149.35.choopa.net – - [04/Feb/2005:13:41:21 -0500] “GET / HTTP/1.1” 200 1708 “http://lowest-mortgage-rates.home.ro/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
66.55.149.35.choopa.net – - [04/Feb/2005:13:42:21 -0500] “GET / HTTP/1.1” 200 1708 “http://lowest-mortgage-rates.home.ro/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
66.55.149.35.choopa.net – - [04/Feb/2005:13:44:16 -0500] “GET / HTTP/1.1” 404 52 “http://lowest-mortgage-rates.home.ro/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
66.55.149.35.choopa.net – - [04/Feb/2005:13:44:34 -0500] “GET / HTTP/1.1” 404 52 “http://lowest-mortgage-rates.home.ro/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
As you can see, the automatic referrer spam detection is working. I have a preliminary version working on my website and it has cut my referrer spam by large margin. Some sneak by, but I’m watching my logs and seeing if its a new trick, or something I missed in my code.
“Defer” (as I’m calling it) has several tricks up its sleeve. I decided against any Bayesian filtering and instead rely on a set of popular keywords, combined with a ‘logic system’ that checks the frequency that a referrer has come in, how many pages that specific refferer ip visited and then sets a 404 if it looks suspicious and adds that referral address to the list of existing keywords.
The only drawback to Defer will be that it is not going to be a Plugin. It’s replaces the index page (before textpattern even THINKS of loading) and does its thing. I feel this is less intensive than it would be if it were a plugin stored in a database.
Any comments?
Offline
Re: Automatic Referral spam Blocking.
That looks pretty neat. Can you explain in more detail how the various criteria add up to determine if a referer is spam? Is it a certain number of hits in a given time period, plus a suspicious keyword, plus number of pages visited over a given threshold? What’s the formula, and how are the various values reached?
Thanks-
-Alan
Offline
#28 2005-02-04 23:57:44
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
A quick peek at the logs from my new gadget. This one doesn’t use keywords at all, and all blacklisting and whitelisting is entirely automatic:
<pre>
2005-02-04 10:50:10: key challenge passed, whitelisting [80.58.21.42] http://www.google.es/search?hl=es&q=zem&meta=
2005-02-04 11:27:21: ip blacklist threshold exceeded, blocking [203.112.194.81] online-poker.crescentarian.net/
2005-02-04 11:36:38: spam detected (L), blacklisting [12.161.206.2] poker-online.crescentarian.net/
2005-02-04 11:37:07: spam detected (L), blacklisting [211-23-250-101.HINET-IP.hinet.net] world-series-of-poker.yelucie.com/
2005-02-04 11:45:44: spam detected (L), blacklisting [tataelxsi.co.in] poker-rules.crescentarian.net/
2005-02-04 11:45:49: .js check tripped, blacklisting [203.197.169.19] http://poker-rules.crescentarian.net/
2005-02-04 11:50:19: ip blacklist threshold exceeded, blocking [203.112.194.81] pacific-poker.yelucie.com/
</pre>
Last edited by zem (2005-02-04 23:58:23)
Alex
Offline
Re: Automatic Referral spam Blocking.
If you’re not using any sort of Keywords, I’de be interested in finding out how you’re checking whether something is spam or not?
What is this, “key challenge” ? Is this something user-input based?
Also, from that log, it looks as though you’re blacklisting IPs? Why? Those change way to frequently.
Defer is mostly automatic, but like any automatic “machine”, it will need adjusting occasionally to catch common strings. I’m ultimately aiming for a method to cut down on bandwidth/processing, and the initial check is whether anything in the referring agent or get/post matches what is in the list and then promptly issuing a 404/exit()
I welcome any input zem may offer as this ultimately is not for (at least for me) any monetary good, but a tool to combat spam.
Online
Re: Automatic Referral spam Blocking.
> zem wrote:
> A quick peek at the logs from my new gadget. This one doesn’t use keywords at all, and all blacklisting and whitelisting is entirely automatic.
cough end user alpha testing cough
…what?
The following is true
The above statement is false.
Offline