Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Txp human detection
skewray wrote #340627:
I am, however, slightly skeptical that Cloudflare will block the agents that are irritating me. I am trying to protect intellectual property from becoming LLM fodder; I think Cloudflare’s main intent is to prevent high server loads. Am I wrong here?
Cloudflare has specific rules and protocols for blocking LLM training bots: https://developers.cloudflare.com/bots/concepts/bot/#ai-bots
This is totally free as well. As for your DNS migration to Cloudflare, it’s incredibly simple! You just plug your domain in and it pulls your existing records. All you have to do is set new nameservers on your domain registrar. I highly recommend letting cloudflare manage your DNS, as it is very low latency DNS with a lot of free tools. And, if you wanted to let cloudflare handle your domain registration – they actually charge you at-cost, no additional fees. (no affiliation with cloudflare, just a very happy free customer)
Offline
Re: Txp human detection
Found a list of bad, and not so bad (see comments) bots that may be of interest to you.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Txp human detection
I’ve seen this bot list before. But bad bots that identify themselves have become extremely rare. I already reject/accept all the self-identifying bots that I’ve seen come by my site. It’s the ones that pretend to be people that I am working on.
This week, I had googlebot come by, as well as another bot pretending to be a browser. They had the same IP address. Isn’t that interesting?
Offline
Re: Txp human detection
Bloke wrote #340548:
Inserting variables inside PHP blocks can be done in a few ways. From least to most future-proof:
<txp:variable name="my_var" value="Hello world" />...
<txp:variable name="my_var" value="Hello world" />...
<txp:variable name="my_var" value="Hello world" />...
Out of interest, as it came up in the other thread, can you also make txp:php pass its output into a variable?
I know you can do:
<txp:variable name="my_var" trim>
<txp:php>
echo "something";
</txp:php>
</txp:variable>
and
<txp:php>
global $variable;
$variable['my_var'] = "something";
</txp:php>
but is there also a way via an attribute?
TXP Builders – finely-crafted code, design and txp
Offline
Re: Txp human detection
In 4.9, the variable attribute is global and should do exactly what you need:
<txp:php variable="my_var">
echo "something";
</txp:php>
Offline
Re: Txp human detection
etc wrote #340714:
In 4.9, the
variableattribute is global and should do exactly what you need:
<txp:php variable="my_var">...
Brilliant. It does! I tried it with the name attribute and had no luck, hence my question Using variable as the attribute makes complete sense.
TXP Builders – finely-crafted code, design and txp
Offline
Re: Txp human detection
Working solution, for posterity. It’s a variation of Stef’s suggestion.
RewriteRule "^(articles/.+)" $1?tt=%{TIME} [L,R=303,CO=bot:1:www.mysite.com]
I then had the issue that LiteSpeed (commercial Apache clone) adds a “.” in front of the domain, so I have to expire that cookie in php; otherwise, it has two cookies with the same name and gets all confused:
if ( isset( $_COOKIE["bot"] ) ) {
$counter = 1 + (int) $_COOKIE["bot"] ;
setcookie( "bot", "", time() - 3600, "/", ".www.mysite.com" ) ;
}
else
$counter = 1 ;
setcookie( "bot", (string) $counter, 0, "/" ) ;
Offline
Re: Txp human detection
Update: Since implementing the above, bot rejection is above 80%. Specifically, rejected were agents that pretended to be browsers, appeared with no previous cookie, and attempted to revisit without a new cookie.
I could improve that by confirming that the revisit is from the same IP address, as well as confirming that the revisit is within some small time window. Both of these are violated by crawler swarms. Implementing these checks in Apache would be a fun exercise that I might do on a rainy day.
Offline