Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Txp human detection
No messy code for article title, so that avenue is out.
Egregiously, 418 I'm a Teapot is missing from the Txp source. That should be added, along with an audio file of someone singing the song, “I’m a little teapot, short and stout.”
.htaccessdetects no cookie and sets302code- Txp fires up
error_302 error_302uses php to set cookie and refresh at top of source code- real browser refreshes URL, loading content
Except, when I RTFC for Txp 4.8.8, it looks like 30X codes are redirected without first checking for error_xxx files. It really should be doing that in the other order. No 20X or 40X codes seem appropriate.
Offline
Re: Txp human detection
skewray wrote #340559:
Shared, so I have almost no control. It is cpanel; all I can do is tinker with .htaccess.
I highly recommend running your DNS through cloudflare. They have powerful tools to stop aggressive bots and AI bots.
Offline
Re: Txp human detection
I created a brilliant bit of php (mostly ripped from txp_die()) and put it in human.php:
setcookie( "bot", 1, 0, "/" ) ;
header("Refresh:1") ;
$out = <<<eod
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="robots" content="noindex">
<title>Checking if Human</title>
</head>
<body>
<h1>Checking if human</h1>
<p>This page should refresh by itself, if you don't do so first.</p>
</body>
</html>
eod ;
echo $out ;
Then I added to .htaccess:
RewriteCond %{HTTP_COOKIE} !"bot=(\d+)"
RewriteRule "^articles/.+" human.php [L,R=302]
This refreshes, but the cookie is never set, so it does an infinite loop. If I modify it to not return 302,
RewriteCond %{HTTP_COOKIE} !"bot=(\d+)"
RewriteRule "^articles/.+" human.php [L]
Then the cookie is set, but the refresh does not happen. (A manual refresh loads the original Txp content.)
So, apparently the browser doesn’t load the cookie if the return code is a 302, and it doesn’t do a refresh for a 200. Maybe if I try (200+302)/2, it will do both???
Last edited by skewray (2025-09-25 15:21:36)
Offline
Re: Txp human detection
Browsers should send cookies on 302s. At least, they used to. Maybe that’s been changed.
Out of curiosity, what happens if you set SameSite to Lax?
$cookie_options = array(
'expires' => 0,
'path' => '/',
'samesite' => 'Lax',
);
setcookie('bot', 1, $cookie_options);
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Hire Txp Builders – finely-crafted code, design and Txp
Offline
Re: Txp human detection
Lax is supposed to be the default, and it was for me. I switched to your version anyway, in case a user has a creative browser.
I put in a 10-second refresh so I could check, and it turns out the cookie is being set. The browser is reloading the human.php version again. I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?), or if I have .htaccess arcana going on.
Offline
Re: Txp human detection
With the risk of wrath, I’m gonna ask this question…have you considered Cloudflare (free tier) protection?
Offline
Re: Txp human detection
skewray wrote #340616:
Laxis supposed to be the default, and it was for me. I switched to your version anyway, in case a user has a creative browser.I put in a 10-second refresh so I could check, and it turns out the cookie is being set. The browser is reloading the
human.phpversion again. I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?), or if I have.htaccessarcana going on.
Hi,
you can have a no-cache in your header. That normally works.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Txp human detection
I would have to switch hosting services, which I don’t really want to do. My hosting service does have something similar. Second, I don’t think Cloudflare stops the bots in question.
For those with Cloudflare, does it stop bots coming from an ISP that do a single access and never come back? Can you check your logs?
Offline
Re: Txp human detection
skewray wrote #340616:
I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?)
Might be getting silly, but adding a cache-busting ?v=some-timestamp-value to your requests might break out of the loop.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Hire Txp Builders – finely-crafted code, design and Txp
Offline
Re: Txp human detection
skewray wrote #340620:
I would have to switch hosting services, which I don’t really want to do. My hosting service does have something similar. Second, I don’t think Cloudflare stops the bots in question.
You’d need to switch your name servers to Cloudflare, and then Cloudflare handles your DNS records. Assuming your domain registrar allows you to change name servers to Cloudflare, you’re golden – no change of hosting company needed.
The process is:
- Add domain to Cloudflare (no changes to existing DNS yet)
- Cloudflare detects existing DNS records, imports them into its system (no changes to existing DNS yet)
- Cloudflare tells you the 2x name servers for your domain (no changes to existing DNS yet)
- You switch your domain to use the Cloudflare name servers (this switches DNS to Cloudflare)
- You then start using Cloudflare for your DNS records stuff.
I can do a screencast example if you’d like to see how it’s done.
For those with Cloudflare, does it stop bots coming from an ISP that do a single access and never come back? Can you check your logs?
Cloudflare has all manner of checks for bots, even on the free tier. You get more granularity & services if you pay them, but the free tier is absolutely fine for a lot of people who don’t want to worry about most bots. It won’t stop proper search bots checking your site, so you’re unlikely to lose any SERPs ranking as a result.
I use Cloudflare for almost all my sites DNS since it’s reliably the fastest (or top 3) DNS resolver. I don’t personally use the protection side of things, but I know if / when I get trounced by whatever invasion, I can flip a switch and get a heap of stuff at my disposal.
And you can always switch your name servers back again, you’re not beholden to Cloudflare for anything going forward.
Offline
Re: Txp human detection
I set Cache-Control: no-cache,no-store,max-age=0 in both .htaccess and the php code. Very belt-and-suspenders. No change in behaviour; it still loads a cookie but refreshes to human.php content. Caching doesn’t seem to be the (only) issue.
Last edited by skewray (2025-09-23 17:13:08)
Offline
Re: Txp human detection
Cloudflare: I checked; my hosting does not allow a change in DNS server. I guess I could ask them if they could make a special hole for me. They’ve done stuff like that before.
I am, however, slightly skeptical that Cloudflare will block the agents that are irritating me. I am trying to protect intellectual property from becoming LLM fodder; I think Cloudflare’s main intent is to prevent high server loads. Am I wrong here?
Offline
Re: Txp human detection
I tried:
RewriteRule "^articles/.+" human.php?tt=%{TIME} [QSA,L,R=302]
It still reloads human.php. Mysteries!
(I’ve only tested with Firefox. Could be that, but it should work in every browser anyway.)
Last edited by skewray (2025-09-25 15:21:58)
Offline
Re: Txp human detection
skewray wrote #340627:
Cloudflare: I checked; my hosting does not allow a change in DNS server. I guess I could ask them if they could make a special hole for me. They’ve done stuff like that before.
May I ask which domain(s) is / are affected? It’s a domain registrar thing rather than a hosting thing, and I can’t recall ever seeing a domain registrar locking down name servers so they can’t be changed.
(Room read: I’m aiming for ‘helpful’ not ‘pedantic’.)
I am, however, slightly skeptical that Cloudflare will block the agents that are irritating me. I am trying to protect intellectual property from becoming LLM fodder; I think Cloudflare’s main intent is to prevent high server loads. Am I wrong here?
There’s nothing stopping you going belt + braces with Cloudflare + your own server-side efforts. It’s possible there’s more to the bot traffic that’s outside the scope of the logs, and it’s possible that it’s not just you being affected by it…so it’s possible Cloudflare have something in place to stop some / most / all of it already. For the sake of a name server switch to Cloudflare for 24 hours or so for an A / B check on your traffic, I’d consider doing it. Factor in your time and defensive measures already in place, and it might pay off.
…and worst case scenario, just flip your name servers back, and you’re out of Cloudflare. Done.
Offline
Re: Txp human detection
Since I only tinker with DNS records once a decade, all knowledge drains out of my brainpan. Previously I was looking at the DNS records at my hosting provider, which, in fact, does not hold my DNS records. My DNS records at my email provider have an “A” record that points to an IP address at my hosting provider. Cool. I assume that I tell Cloudflare about that IP address, so that my website doesn’t become unmoored from the universe. I may try this at some point.
Offline