Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#31 2025-09-24 03:21:24

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,388
Website GitHub Mastodon Twitter

Re: Referrer spam

skewray wrote #340572:

I suspect these hits are Google Cloud, not fake IPs. FYI, my ham-handed solution :

# Google AS15169 Evil=96.2% 2025-08-27 Warning: May block Google employees....

The ranges are not a complete set, just what I’ve seen on my site. If I get the cookie thing working, I may rip this sort of stuff out. It is a bit labor intensive to create.

I found that the reason I was getting a 500 on this was the white spaces. I will not be using it, but the configuration below worked for me.

RewriteCond %{HTTP_USER_AGENT} "^Mozilla" [OR]
RewriteCond %{HTTP_USER_AGENT} "aiohttp" [OR]
RewriteCond %{HTTP_USER_AGENT} !"Google"
RewriteCond %{REMOTE_ADDR} ^34\.([1-9]?\d|1[0-8]\d|19[01])\. [OR]
RewriteCond %{REMOTE_ADDR} ^35\.(20[89]|2[1-3]\d|24[0-7])\. [OR]
RewriteCond %{REMOTE_ADDR} ^66\.249\.(6[4-9]|[78]\d|9[0-5])\. [OR]
RewriteCond %{REMOTE_ADDR} ^104\.19[6-9]\. [NC]
RewriteRule .* - [F,E=rej:R21]

Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#32 2025-09-24 03:26:28

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 279
Website Mastodon

Re: Referrer spam

The official Apache .htaccess “standard” doesn’t allow comments after the []. Mine does, so I document each line.

The E=rej:R21 is for logging on my website. I should have taken that out when I posted it. (The “R21” rule has blocked 59 times so far this month!)

Last edited by skewray (2025-09-24 03:28:08)

Offline

#33 2025-09-25 04:06:08

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,388
Website GitHub Mastodon Twitter

Re: Referrer spam

colak wrote #340615:

Hi Philippe,
I use the latest 2.4 version, and the following in my htaccess.

<RequireAll>...

Just an update that this only worked for a day.

I’m now experimenting with

<If "%{REMOTE_ADDR} =~ /^34\.174\.\d{1,3}\.\d{1,3}/">
    Require all denied
</If>

and if it does not work, I’ll try

<If "%{REMOTE_ADDR} =~ /^34\.174\.(?:25[0-5]|2[0-4]\d|1?\d{1,2})\.(?:25[0-5]|2[0-4]\d|1?\d{1,2})/">
    Require all denied
</If>

or

<If "-R '34.174.0.0/16'">
    Require all denied
</If>

A suggestion by our host returned a 500 error.

<IfModule mod_authz_core.c>
    Require all granted
    Require not ip 34.174
</IfModule>

Previous versions of Apache were so much simpler!


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#34 2025-10-02 05:49:32

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,388
Website GitHub Mastodon Twitter

Re: Referrer spam

Just a heads up for whoever is attacked by the 34.174. range Google Cloud IPs. Blocking all of them, which is what I did, also blocks the w3 validator which is apparently using them too. Nothing I can do here except validate my articles by copy/pasting the source in the validator.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#35 2025-10-02 06:07:02

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 5,212
Website GitHub

Re: Referrer spam

colak wrote #340810:

Blocking the … 34.174. range IPs … also blocks the w3 validator … Nothing I can do here except validate my articles by copy/pasting the source in the validator.

Does your favoured browser have a validator browser extension? I don’t know how they work – it’s possible they just pass on the source to the w3c validator and output the results – but maybe not. That might end up being quicker than what you have now.


TXP Builders – finely-crafted code, design and txp

Offline

#36 2025-10-02 08:06:47

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,388
Website GitHub Mastodon Twitter

Re: Referrer spam

jakob wrote #340811:

Does your favoured browser have a validator browser extension? I don’t know how they work – it’s possible they just pass on the source to the w3c validator and output the results – but maybe not. That might end up being quicker than what you have now.

It does through the developer tools extension. Maybe that is the one which is passing through the Google cloud. I’ll check on it.

> Edit: It’s the w3 validator! Pasting a url there returns my 403 page.

Last edited by colak (2025-10-02 08:19:21)


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#37 2025-10-02 15:21:36

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 279
Website Mastodon

Re: Referrer spam

My solution to let through non-evil bots is at Codeberg and should work as long as the validator identifies itself somehow.

Offline

#38 2025-10-03 04:52:41

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,388
Website GitHub Mastodon Twitter

Re: Referrer spam

skewray wrote #340816:

My solution to let through non-evil bots is at Codeberg and should work as long as the validator identifies itself somehow.

Thanks so much, I had a look at that. The problem as you mention is that your directives also block employees of the companies. Also, we are using Amazon to process our newsletter which I wouldn’t want to mess up.

As we have an open access ethos, on all aspects of our organisation, at this stage, we are only blocking attacks. ChatGTP and other AI sites do visit our site according to our logs, but they do so sporadically and without creating an overload to our server.

Now for the controversial part as decided by our committee.

  1. Although I personally agree with the issues of LLM training, we decided not to block those bots because we would rather have the LLMs be informed from our site rather than other, with content that may be questionable.
  2. Google is unfortunately the search engine of choice by the majority of people. Blocking it is like shouting our self on the foot. At the same time, there are no real alternatives to Google scholar and Google books. The practices of many other popular search engines are also questionable as well.
  3. The case of popular social media. Although we are all too aware of their unethical and manipulative practices, there is a practical and functional necessity to keep our accounts running to disseminate our projects.

On the bright side. Today was the first day for some weeks now, that we did not have any hits from the 34.174 range.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#39 2025-10-03 15:31:20

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 279
Website Mastodon

Re: Referrer spam

I would definitely not suggest taking my entire solution as it is. It still changes from day to day, as an ongoing experiment on a website of zero financial importance.

I was thinking that if you are already blocking 34.174.X.X, then you could at least let through self-identifying bots. However, if the validator in question mimics a browser, then this is exactly what I block. I thought that the w3c validator identified itself with AGENT_STRING as “W3C_Validator/1.3 libwww-perl/6.68”? That would get through.

Offline

#40 2025-10-03 16:18:11

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,388
Website GitHub Mastodon Twitter

Re: Referrer spam

skewray wrote #340824:

I thought that the w3c validator identified itself with AGENT_STRING as “W3C_Validator/1.3 libwww-perl/6.68”? That would get through.

I does. I tried using different directives to exclude the validator but they returned a 500. Admittedly I work better in the morning.

It does not worry me very much as there were no hits form the range in the past 24 hours or so. I’ll keep the directive as is until the end of next week and then, keeping my fingers crossed, I’ll comment it out.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#41 2025-11-16 23:34:09

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 279
Website Mastodon

Re: Referrer spam

On the subject of referrer spam, I’ve been getting hits from DuckDuckGo(!) with spamming cookies:

organic_source_str=Other;
user_agent=DuckDuckBot%2F1.1%3B%20%28%2Bhttp%3A%2F%2Fduckduckgo.com%2Fduckduckbot.html%29;
handl_original_ref=https%3A%2F%2Frobotalp.com;
handl_url=https%3A%2F%2Frvdepottx.com%2F; 
handl_landing_page=https%3A%2F%2Fmarqueecapitalfund.com%2F;
organic_source=https%3A%2F%2Frobotalp.com;
handl_ref=https%3A%2F%2Frvdepottx.com%2F;
_lgl_app_50=MW9YNXRaaVNhR2JRQmZOclZwdkJVZjNjaGlqS0J1VU9mR1VwWVp0MjBmckZ5dTU2ejVCNW9hMmdTTnRNbUNwUTViQnVDRWl1RXhDWEpFMzVPTmhtWTRRUUJUd0FXV3JvUnVXcUVtR3doa3NOM09LNzJIcDdxNXNRSjNTb0ZxZjhkbmcvelRyN1ZZV0J4MXFvTnczVi8wNnNIVUhLQStOVVBnOE9lK2hRZDljSkl2aXdSQXk3V3czTXBQeHdRVDEwLS1aY2M2Z3pxRHZaWkIxVWNkQlZBYzVnPT0%3D--663f049b3b06cc0277cef91d77afa6c48ddd86b7;
handl_url_base=https%3A%2F%2Frvdepottx.com%2F;
HandLtestDomainNameServer=HandLtestDomainValueServer; 
handl_ip=40.88.21.235_IP40.88.21.235_; organic_source_str=Other; 
user_agent=DuckDuckBot%2F1.1%3B%20%28%2Bhttp%3A%2F%2Fduckduckgo.com%2Fduckduckbot.html%29;
handl_original_ref=https%3A%2F%2Frobotalp.com;
handl_url=https%3A%2F%2Frvdepottx.com%2F; handl_landing_page=https%3A%2F%2Fmarqueecapitalfund.com%2F; 
organic_source=https%3A%2F%2Frobotalp.com;
handl_ref=https%3A%2F%2Frvdepottx.com%2F;
_lgl_app_50=MW9YNXRaaVNhR2JRQmZOclZwdkJVZjNjaGlqS0J1VU9mR1VwWVp0MjBmckZ5dTU2ejVCNW9hMmdTTnRNbUNwUTViQnVDRWl1RXhDWEpFMzVPTmhtWTRRUUJUd0FXV3JvUnVXcUVtR3doa3NOM09LNzJIcDdxNXNRSjNTb0ZxZjhkbmcvelRyN1ZZV0J4MXFvTnczVi8wNnNIVUhLQStOVVBnOE9lK2hRZDljSkl2aXdSQXk3V3czTXBQeHdRVDEwLS1aY2M2Z3pxRHZaWkIxVWNkQlZBYzVnPT0%3D--663f049b3b06cc0277cef91d77afa6c48ddd86b7;
handl_url_base=https%3A%2F%2Frvdepottx.com%2F;
HandLtestDomainNameServer=HandLtestDomainValueServer;
handl

I am having a hard time seeing the point of these cookies. I only noticed it because I was putting cookies into my logs to debug something. Blocking it! (DuckDuckGo & ‘handl’)

Last edited by skewray (2025-11-16 23:35:11)

Offline

#42 2025-11-17 10:22:26

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,755
GitHub

Re: Referrer spam

skewray wrote #341208:

Blocking it! (DuckDuckGo & ‘handl’)

That ‘handl’ stuff is likely UTM Grabber (see https://utmgrabber.com – I won’t link it as they don’t need the incoming links) if that helps.

Last edited by gaekwad (2025-11-17 10:22:37)

Offline

#43 2025-11-17 15:25:55

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,388
Website GitHub Mastodon Twitter

Re: Referrer spam

skewray wrote #341208:

On the subject of referrer spam, I’ve been getting hits from DuckDuckGo(!) with spamming cookies:

It’s probably not DDG but kiddies trying to obfuscate themselves. Unless DDG has shifted their ethical practices.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#44 2025-11-17 16:09:54

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 279
Website Mastodon

Re: Referrer spam

Whoever it was appeared to come from a DDG IP address. That’s sophisticated middle-school technology right there.

Offline

#45 2025-11-17 16:23:18

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,755
GitHub

Re: Referrer spam

skewray wrote #341219:

Whoever it was appeared to come from a DDG IP address. That’s sophisticated middle-school technology right there.

Please forgive any unintended terseness – the IP address in the cookie text above is 40.88.21.235, which is a Microsoft allocation…which implies it’s an Azure host, not DuckDuckGo. Edit: the user agent is trivial to fake, so I’d take any DDG involvement with a heavy pinch of salt.

Last edited by gaekwad (2025-11-17 16:25:52)

Offline

Board footer

Powered by FluxBB