Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Referrer spam
My solution to let through non-evil bots is at Codeberg and should work as long as the validator identifies itself somehow.
Offline
Re: Referrer spam
skewray wrote #340816:
My solution to let through non-evil bots is at Codeberg and should work as long as the validator identifies itself somehow.
Thanks so much, I had a look at that. The problem as you mention is that your directives also block employees of the companies. Also, we are using Amazon to process our newsletter which I wouldn’t want to mess up.
As we have an open access ethos, on all aspects of our organisation, at this stage, we are only blocking attacks. ChatGTP and other AI sites do visit our site according to our logs, but they do so sporadically and without creating an overload to our server.
Now for the controversial part as decided by our committee.
- Although I personally agree with the issues of LLM training, we decided not to block those bots because we would rather have the LLMs be informed from our site rather than other, with content that may be questionable.
- Google is unfortunately the search engine of choice by the majority of people. Blocking it is like shouting our self on the foot. At the same time, there are no real alternatives to Google scholar and Google books. The practices of many other popular search engines are also questionable as well.
- The case of popular social media. Although we are all too aware of their unethical and manipulative practices, there is a practical and functional necessity to keep our accounts running to disseminate our projects.
On the bright side. Today was the first day for some weeks now, that we did not have any hits from the 34.174 range.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Referrer spam
I would definitely not suggest taking my entire solution as it is. It still changes from day to day, as an ongoing experiment on a website of zero financial importance.
I was thinking that if you are already blocking 34.174.X.X, then you could at least let through self-identifying bots. However, if the validator in question mimics a browser, then this is exactly what I block. I thought that the w3c validator identified itself with AGENT_STRING as “W3C_Validator/1.3 libwww-perl/6.68”? That would get through.
Offline
Re: Referrer spam
skewray wrote #340824:
I thought that the w3c validator identified itself with AGENT_STRING as “W3C_Validator/1.3 libwww-perl/6.68”? That would get through.
I does. I tried using different directives to exclude the validator but they returned a 500. Admittedly I work better in the morning.
It does not worry me very much as there were no hits form the range in the past 24 hours or so. I’ll keep the directive as is until the end of next week and then, keeping my fingers crossed, I’ll comment it out.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline