Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Txp human detection
Is it too late in the chain to do anything in Txp itself? e.g. by (ab)using Pages named error_{statuscode} so they do some mapping skuldugguery for you? error_303, error_404, etc will get called every time Apache encounters the corresponding code and you could take action or do further redirects/processing/filtering as appropriate.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Txp human detection
That’s an interesting idea. My current .htaccess file always stops processing when setting a return code. I assume that I would set a status code in .htaccess, it drops through to Txp, and then Txp would run a error_xxx page immediately?
(By the way, I can find no documentation on messy URLs. The admin sets them, but nowhere can I find a description of all the possible variations that would lead to a given Txp product.)
Offline
Re: Txp human detection
skewray wrote #340604:
I assume that I would set a status code in
.htaccess, it drops through to Txp, and then Txp would run aerror_xxxpage immediately?
That’s the general idea, yes. It allows you to display different stuff based on the server return code – or, at least, our cut-down set of codes we actually handle.
See txp:if_status and error_status (which you could use in an attribute of a <txp:evaluate> tag to test, if you prefer).
By the way, I can find no documentation on messy URLs.
It’s all messy syntax under the hood. The clean URLs are just prettier replacements. The main culprits are:
id=Asset ID (cross-content type)s=Article Sectionc=Category (cross-content type)author=Author (cross-content type)q=Searchmonth=Monthpg=Current results page (for pagination)filename=File (ifs=file_download)f=Form (for passing content to Forms from the URL)status=Web server status code for the resource being accessedp=(legacy) Image extraction and navigation
There are many others used internally, and if you start donkeying with those, things will probably break. Examples are context, feed, skin (front-end theme), page, css, and so on.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Txp human detection
No messy code for article title, so that avenue is out.
Egregiously, 418 I'm a Teapot is missing from the Txp source. That should be added, along with an audio file of someone singing the song, “I’m a little teapot, short and stout.”
.htaccessdetects no cookie and sets302code- Txp fires up
error_302 error_302uses php to set cookie and refresh at top of source code- real browser refreshes URL, loading content
Except, when I RTFC for Txp 4.8.8, it looks like 30X codes are redirected without first checking for error_xxx files. It really should be doing that in the other order. No 20X or 40X codes seem appropriate.
Offline
Re: Txp human detection
skewray wrote #340559:
Shared, so I have almost no control. It is cpanel; all I can do is tinker with .htaccess.
I highly recommend running your DNS through cloudflare. They have powerful tools to stop aggressive bots and AI bots.
Offline
Re: Txp human detection
I created a brilliant bit of php (mostly ripped from txp_die()) and put it in human.php:
setcookie( "bot", 1, 0, "/" ) ;
header("Refresh:1") ;
$out = <<<eod
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="robots" content="noindex">
<title>Checking if Human</title>
</head>
<body>
<h1>Checking if human</h1>
<p>This page should refresh by itself, if you don't do so first.</p>
</body>
</html>
eod ;
echo $out ;
Then I added to .htaccess:
RewriteCond %{HTTP_COOKIE} !"bot=(\d+)"
RewriteRule "^articles/.+" human.php [L,R=302]
This refreshes, but the cookie is never set, so it does an infinite loop. If I modify it to not return 302,
RewriteCond %{HTTP_COOKIE} !"bot=(\d+)"
RewriteRule "^articles/.+" human.php [L]
Then the cookie is set, but the refresh does not happen. (A manual refresh loads the original Txp content.)
So, apparently the browser doesn’t load the cookie if the return code is a 302, and it doesn’t do a refresh for a 200. Maybe if I try (200+302)/2, it will do both???
Last edited by skewray (2025-09-25 15:21:36)
Offline
Re: Txp human detection
Browsers should send cookies on 302s. At least, they used to. Maybe that’s been changed.
Out of curiosity, what happens if you set SameSite to Lax?
$cookie_options = array(
'expires' => 0,
'path' => '/',
'samesite' => 'Lax',
);
setcookie('bot', 1, $cookie_options);
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Txp human detection
Lax is supposed to be the default, and it was for me. I switched to your version anyway, in case a user has a creative browser.
I put in a 10-second refresh so I could check, and it turns out the cookie is being set. The browser is reloading the human.php version again. I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?), or if I have .htaccess arcana going on.
Offline
Re: Txp human detection
With the risk of wrath, I’m gonna ask this question…have you considered Cloudflare (free tier) protection?
Offline
Re: Txp human detection
skewray wrote #340616:
Laxis supposed to be the default, and it was for me. I switched to your version anyway, in case a user has a creative browser.I put in a 10-second refresh so I could check, and it turns out the cookie is being set. The browser is reloading the
human.phpversion again. I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?), or if I have.htaccessarcana going on.
Hi,
you can have a no-cache in your header. That normally works.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Txp human detection
I would have to switch hosting services, which I don’t really want to do. My hosting service does have something similar. Second, I don’t think Cloudflare stops the bots in question.
For those with Cloudflare, does it stop bots coming from an ISP that do a single access and never come back? Can you check your logs?
Offline
Re: Txp human detection
skewray wrote #340616:
I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?)
Might be getting silly, but adding a cache-busting ?v=some-timestamp-value to your requests might break out of the loop.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline