Referrer spam

skewray · 2025-10-02 15:21:36

My solution to let through non-evil bots is at Codeberg and should work as long as the validator identifies itself somehow.

colak · 2025-10-03 04:52:41

skewray wrote #340816:

My solution to let through non-evil bots is at Codeberg and should work as long as the validator identifies itself somehow.

Thanks so much, I had a look at that. The problem as you mention is that your directives also block employees of the companies. Also, we are using Amazon to process our newsletter which I wouldn’t want to mess up.

As we have an open access ethos, on all aspects of our organisation, at this stage, we are only blocking attacks. ChatGTP and other AI sites do visit our site according to our logs, but they do so sporadically and without creating an overload to our server.

Now for the controversial part as decided by our committee.

Although I personally agree with the issues of LLM training, we decided not to block those bots because we would rather have the LLMs be informed from our site rather than other, with content that may be questionable.
Google is unfortunately the search engine of choice by the majority of people. Blocking it is like shouting our self on the foot. At the same time, there are no real alternatives to Google scholar and Google books. The practices of many other popular search engines are also questionable as well.
The case of popular social media. Although we are all too aware of their unethical and manipulative practices, there is a practical and functional necessity to keep our accounts running to disseminate our projects.

On the bright side. Today was the first day for some weeks now, that we did not have any hits from the 34.174 range.

skewray · 2025-10-03 15:31:20

I would definitely not suggest taking my entire solution as it is. It still changes from day to day, as an ongoing experiment on a website of zero financial importance.

I was thinking that if you are already blocking 34.174.X.X, then you could at least let through self-identifying bots. However, if the validator in question mimics a browser, then this is exactly what I block. I thought that the w3c validator identified itself with AGENT_STRING as “W3C_Validator/1.3 libwww-perl/6.68”? That would get through.

colak · 2025-10-03 16:18:11

skewray wrote #340824:

I thought that the w3c validator identified itself with AGENT_STRING as “W3C_Validator/1.3 libwww-perl/6.68”? That would get through.

I does. I tried using different directives to exclude the validator but they returned a 500. Admittedly I work better in the morning.

It does not worry me very much as there were no hits form the range in the past 24 hours or so. I’ll keep the directive as is until the end of next week and then, keeping my fingers crossed, I’ll comment it out.

skewray · 2025-11-16 23:34:09

On the subject of referrer spam, I’ve been getting hits from DuckDuckGo(!) with spamming cookies:

organic_source_str=Other;
user_agent=DuckDuckBot%2F1.1%3B%20%28%2Bhttp%3A%2F%2Fduckduckgo.com%2Fduckduckbot.html%29;
handl_original_ref=https%3A%2F%2Frobotalp.com;
handl_url=https%3A%2F%2Frvdepottx.com%2F; 
handl_landing_page=https%3A%2F%2Fmarqueecapitalfund.com%2F;
organic_source=https%3A%2F%2Frobotalp.com;
handl_ref=https%3A%2F%2Frvdepottx.com%2F;
_lgl_app_50=MW9YNXRaaVNhR2JRQmZOclZwdkJVZjNjaGlqS0J1VU9mR1VwWVp0MjBmckZ5dTU2ejVCNW9hMmdTTnRNbUNwUTViQnVDRWl1RXhDWEpFMzVPTmhtWTRRUUJUd0FXV3JvUnVXcUVtR3doa3NOM09LNzJIcDdxNXNRSjNTb0ZxZjhkbmcvelRyN1ZZV0J4MXFvTnczVi8wNnNIVUhLQStOVVBnOE9lK2hRZDljSkl2aXdSQXk3V3czTXBQeHdRVDEwLS1aY2M2Z3pxRHZaWkIxVWNkQlZBYzVnPT0%3D--663f049b3b06cc0277cef91d77afa6c48ddd86b7;
handl_url_base=https%3A%2F%2Frvdepottx.com%2F;
HandLtestDomainNameServer=HandLtestDomainValueServer; 
handl_ip=40.88.21.235_IP40.88.21.235_; organic_source_str=Other; 
user_agent=DuckDuckBot%2F1.1%3B%20%28%2Bhttp%3A%2F%2Fduckduckgo.com%2Fduckduckbot.html%29;
handl_original_ref=https%3A%2F%2Frobotalp.com;
handl_url=https%3A%2F%2Frvdepottx.com%2F; handl_landing_page=https%3A%2F%2Fmarqueecapitalfund.com%2F; 
organic_source=https%3A%2F%2Frobotalp.com;
handl_ref=https%3A%2F%2Frvdepottx.com%2F;
_lgl_app_50=MW9YNXRaaVNhR2JRQmZOclZwdkJVZjNjaGlqS0J1VU9mR1VwWVp0MjBmckZ5dTU2ejVCNW9hMmdTTnRNbUNwUTViQnVDRWl1RXhDWEpFMzVPTmhtWTRRUUJUd0FXV3JvUnVXcUVtR3doa3NOM09LNzJIcDdxNXNRSjNTb0ZxZjhkbmcvelRyN1ZZV0J4MXFvTnczVi8wNnNIVUhLQStOVVBnOE9lK2hRZDljSkl2aXdSQXk3V3czTXBQeHdRVDEwLS1aY2M2Z3pxRHZaWkIxVWNkQlZBYzVnPT0%3D--663f049b3b06cc0277cef91d77afa6c48ddd86b7;
handl_url_base=https%3A%2F%2Frvdepottx.com%2F;
HandLtestDomainNameServer=HandLtestDomainValueServer;
handl

I am having a hard time seeing the point of these cookies. I only noticed it because I was putting cookies into my logs to debug something. Blocking it! (DuckDuckGo & ‘handl’)

Last edited by skewray (2025-11-16 23:35:11)

gaekwad · 2025-11-17 10:22:26

skewray wrote #341208:

Blocking it! (DuckDuckGo & ‘handl’)

That ‘handl’ stuff is likely UTM Grabber (see https://utmgrabber.com – I won’t link it as they don’t need the incoming links) if that helps.

Last edited by gaekwad (2025-11-17 10:22:37)

colak · 2025-11-17 15:25:55

skewray wrote #341208:

On the subject of referrer spam, I’ve been getting hits from DuckDuckGo(!) with spamming cookies:

It’s probably not DDG but kiddies trying to obfuscate themselves. Unless DDG has shifted their ethical practices.

skewray · 2025-11-17 16:09:54

Whoever it was appeared to come from a DDG IP address. That’s sophisticated middle-school technology right there.

gaekwad · 2025-11-17 16:23:18

skewray wrote #341219:

Whoever it was appeared to come from a DDG IP address. That’s sophisticated middle-school technology right there.

Please forgive any unintended terseness – the IP address in the cookie text above is 40.88.21.235, which is a Microsoft allocation…which implies it’s an Azure host, not DuckDuckGo. Edit: the user agent is trivial to fake, so I’d take any DDG involvement with a heavy pinch of salt.

Last edited by gaekwad (2025-11-17 16:25:52)

skewray · 2025-11-17 16:28:58

DDG has been using Microsoft Cloud for a long time now. their list

I block Azure, but have a hole for DuckDuckBot.

gaekwad · 2025-11-17 16:30:03

skewray wrote #341221:

DDG has been using Microsoft Cloud for a long time now. their list

I respectfully stand corrected. Thanks for the info!

Textpattern CMS

Textpattern CMS support forum

#37 2025-10-02 15:21:36

Re: Referrer spam

#38 2025-10-03 04:52:41

Re: Referrer spam

skewray wrote #340816:

#39 2025-10-03 15:31:20

Re: Referrer spam

#40 2025-10-03 16:18:11

Re: Referrer spam

skewray wrote #340824:

#41 2025-11-16 23:34:09

Re: Referrer spam

#42 2025-11-17 10:22:26

Re: Referrer spam

skewray wrote #341208:

#43 2025-11-17 15:25:55

Re: Referrer spam

skewray wrote #341208:

#44 2025-11-17 16:09:54

Re: Referrer spam

#45 2025-11-17 16:23:18

Re: Referrer spam

skewray wrote #341219:

#46 2025-11-17 16:28:58

Re: Referrer spam

#47 2025-11-17 16:30:03

Re: Referrer spam

skewray wrote #341221:

Board footer