Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#25 2005-01-31 00:10:52
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
Kibitzer,
Spammers already change their URLs and IPs frequently in order to turn manual blacklists into a game of whack-a-mole.
Alex
Offline
Re: Automatic Referral spam Blocking.
66.55.149.35.choopa.net – - [04/Feb/2005:13:41:21 -0500] “GET / HTTP/1.1” 200 1708 “http://lowest-mortgage-rates.home.ro/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
66.55.149.35.choopa.net – - [04/Feb/2005:13:42:21 -0500] “GET / HTTP/1.1” 200 1708 “http://lowest-mortgage-rates.home.ro/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
66.55.149.35.choopa.net – - [04/Feb/2005:13:44:16 -0500] “GET / HTTP/1.1” 404 52 “http://lowest-mortgage-rates.home.ro/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
66.55.149.35.choopa.net – - [04/Feb/2005:13:44:34 -0500] “GET / HTTP/1.1” 404 52 “http://lowest-mortgage-rates.home.ro/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
As you can see, the automatic referrer spam detection is working. I have a preliminary version working on my website and it has cut my referrer spam by large margin. Some sneak by, but I’m watching my logs and seeing if its a new trick, or something I missed in my code.
“Defer” (as I’m calling it) has several tricks up its sleeve. I decided against any Bayesian filtering and instead rely on a set of popular keywords, combined with a ‘logic system’ that checks the frequency that a referrer has come in, how many pages that specific refferer ip visited and then sets a 404 if it looks suspicious and adds that referral address to the list of existing keywords.
The only drawback to Defer will be that it is not going to be a Plugin. It’s replaces the index page (before textpattern even THINKS of loading) and does its thing. I feel this is less intensive than it would be if it were a plugin stored in a database.
Any comments?
Offline
Re: Automatic Referral spam Blocking.
That looks pretty neat. Can you explain in more detail how the various criteria add up to determine if a referer is spam? Is it a certain number of hits in a given time period, plus a suspicious keyword, plus number of pages visited over a given threshold? What’s the formula, and how are the various values reached?
Thanks-
-Alan
Offline
#28 2005-02-04 23:57:44
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
A quick peek at the logs from my new gadget. This one doesn’t use keywords at all, and all blacklisting and whitelisting is entirely automatic:
<pre>
2005-02-04 10:50:10: key challenge passed, whitelisting [80.58.21.42] http://www.google.es/search?hl=es&q=zem&meta=
2005-02-04 11:27:21: ip blacklist threshold exceeded, blocking [203.112.194.81] online-poker.crescentarian.net/
2005-02-04 11:36:38: spam detected (L), blacklisting [12.161.206.2] poker-online.crescentarian.net/
2005-02-04 11:37:07: spam detected (L), blacklisting [211-23-250-101.HINET-IP.hinet.net] world-series-of-poker.yelucie.com/
2005-02-04 11:45:44: spam detected (L), blacklisting [tataelxsi.co.in] poker-rules.crescentarian.net/
2005-02-04 11:45:49: .js check tripped, blacklisting [203.197.169.19] http://poker-rules.crescentarian.net/
2005-02-04 11:50:19: ip blacklist threshold exceeded, blocking [203.112.194.81] pacific-poker.yelucie.com/
</pre>
Last edited by zem (2005-02-04 23:58:23)
Alex
Offline
Re: Automatic Referral spam Blocking.
If you’re not using any sort of Keywords, I’de be interested in finding out how you’re checking whether something is spam or not?
What is this, “key challenge” ? Is this something user-input based?
Also, from that log, it looks as though you’re blacklisting IPs? Why? Those change way to frequently.
Defer is mostly automatic, but like any automatic “machine”, it will need adjusting occasionally to catch common strings. I’m ultimately aiming for a method to cut down on bandwidth/processing, and the initial check is whether anything in the referring agent or get/post matches what is in the list and then promptly issuing a 404/exit()
I welcome any input zem may offer as this ultimately is not for (at least for me) any monetary good, but a tool to combat spam.
Offline
Re: Automatic Referral spam Blocking.
> zem wrote:
> A quick peek at the logs from my new gadget. This one doesn’t use keywords at all, and all blacklisting and whitelisting is entirely automatic.
cough end user alpha testing cough
…what?
The following is true
The above statement is false.
Offline
#31 2005-02-05 02:25:26
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
> What is this, “key challenge” ? Is this something user-input based?
Nope. It’s a trap (one of several) designed to automatically detect referrer spambots. No user or admin intervention required.
> Also, from that log, it looks as though you’re blacklisting IPs? Why? Those change way to frequently.
1. The blacklist is short term only (6 hours by default), and requires multiple entries before an IP is blocked.
2. Blocking based on URLs is dangerous – a malicious user could deliberately refer-spam someone else’s URL, and cause legitimate traffic to be blocked.
> Defer is mostly automatic, but like any automatic “machine”, it will need adjusting occasionally to catch common strings.
Mine (“Dereference”) is set-and-forget. No intervention required after installation.
> I’m ultimately aiming for a method to cut down on bandwidth/processing, and the initial check is whether anything in the referring agent or get/post matches what is in the list and then promptly issuing a 404/exit()
404 sounds like a risky choice. There are more appropriate error messages.
> I welcome any input zem may offer as this ultimately is not for (at least for me) any monetary good, but a tool to combat spam.
Spam is economics. It’s all about money.
Alex
Offline
Re: Automatic Referral spam Blocking.
> 2. Blocking based on URLs is dangerous – a malicious user could
> deliberately refer-spam someone else’s URL, and cause legitimate
> traffic to be blocked.
I anticipated this and am planning an exception list (ie. google, google crawler, yahoo slurp, etc)
> 404 sounds like a risky choice. There are more appropriate error
> messages.
Explain. How exactly is it a risky choice?
> Spam is economics. It’s all about money.
This we’ll disagree on. I’m inclined to believe that you’re trying to thwart any affect a free alternative will have vs. ransomware.
What happens when your traps are thwarted? Are you going to come out with a new version and charge for that? I’m aiming for something that can expand as necessary (ie, keywords)
Offline
#33 2005-02-05 02:55:09
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
> I anticipated this and am planning an exception list (ie. google, google crawler, yahoo slurp, etc)
What about when the Evil Right Wing Blogs start refer-spamming the URLs of Evil Left Wing Blogs, and vice versa? Your exception list could get rather long.
> Explain. How exactly is it a risky choice?
Legitimate bots and user agents sometimes take action based on HTTP status codes. If you accidentally serve up a 404 to a search spider or blog indexer that’s not on your list, you could wind up de-listing yourself from a search engine or whatever. Or fooling a link checker bot, or misleading a user. Better to serve up an informative error so that legitimate bots and users can take appropriate action.
> This we’ll disagree on. I’m inclined to believe that you’re trying to thwart any affect a free alternative will have vs. ransomware.
Exactly what am I thwarting, and how?
Last edited by zem (2005-02-05 03:00:25)
Alex
Offline
Re: Automatic Referral spam Blocking.
>Legitimate bots and user agents sometimes take action based on >HTTP status codes. If you accidentally serve up a 404 to a search >spider or blog indexer that’s not on your list, you could wind up >de-listing yourself from a search engine or whatever. Or fooling a link >checker bot, or misleading a user. Better to serve up an informative >error so that legitimate bots and users can take appropriate action.
Defer only acts when there is a refererrer. Google/spiders/crawlers don’t spider websites using refers. Delisting? Thats a silly thought (FUD)
The thought that defer may throw a false spam positive has been taken into consideration, display text stating why and how to correct it for every 404 it displays.
I’m not going to believe that any simple trap is going to catch a spammer, anything popular enough will be exploited and a weakness will be found. Your traps will be affective for only such a short period of time before someone smarter than you figures out how to exploit it, or bypass it.
The most affective means has proven to be keywords and heuristics, of course, because of $ reaons, you will disagree with me. Can’t say I think to highly of your offering “deference” as ransom ware.
Offline
#35 2005-02-05 03:37:09
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
> Defer only acts when there is a refererrer. Google/spiders/crawlers don’t spider websites using refers. Delisting? Thats a silly thought (FUD)
blo.gs does. Also some link checkers, RSS aggregators, etc.
> I’m not going to believe that any simple trap is going to catch a spammer, anything popular enough will be exploited and a weakness will be found. Your traps will be affective for only such a short period of time before someone smarter than you figures out how to exploit it, or bypass it.
Right, cos I hadn’t thought of that.
> The most affective means has proven to be keywords and heuristics,
How many successful keyword based spam filters or web content filters are there?
Alex
Offline
#36 2005-02-05 03:40:54
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Automatic Referral spam Blocking.
> zem wrote:
> > Defer only acts when there is a refererrer. Google/spiders/crawlers don’t spider websites using refers. Delisting? Thats a silly thought (FUD)
blo.gs does. Also some link checkers, RSS aggregators, etc.
> I’m not going to believe that any simple trap is going to catch a spammer, anything popular enough will be exploited and a weakness will be found. Your traps will be affective for only such a short period of time before someone smarter than you figures out how to exploit it, or bypass it.
Right, cos I hadn’t thought of that.
> The most affective means has proven to be keywords and heuristics,
How many successful keyword based spam filters or web content filters are there? (“heuristics” is a bit misleading – any kind of filter could be described as heuristic)
Last edited by zem (2005-02-05 03:41:21)
Alex
Offline