Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Txp human detection
Is it too late in the chain to do anything in Txp itself? e.g. by (ab)using Pages named error_{statuscode}
so they do some mapping skuldugguery for you? error_303
, error_404
, etc will get called every time Apache encounters the corresponding code and you could take action or do further redirects/processing/filtering as appropriate.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Txp human detection
That’s an interesting idea. My current .htaccess
file always stops processing when setting a return code. I assume that I would set a status code in .htaccess
, it drops through to Txp, and then Txp would run a error_xxx
page immediately?
(By the way, I can find no documentation on messy URLs. The admin sets them, but nowhere can I find a description of all the possible variations that would lead to a given Txp product.)
Offline
Re: Txp human detection
skewray wrote #340604:
I assume that I would set a status code in
.htaccess
, it drops through to Txp, and then Txp would run aerror_xxx
page immediately?
That’s the general idea, yes. It allows you to display different stuff based on the server return code – or, at least, our cut-down set of codes we actually handle.
See txp:if_status and error_status (which you could use in an attribute of a <txp:evaluate>
tag to test, if you prefer).
By the way, I can find no documentation on messy URLs.
It’s all messy syntax under the hood. The clean URLs are just prettier replacements. The main culprits are:
id=
Asset ID (cross-content type)s=
Article Sectionc=
Category (cross-content type)author=
Author (cross-content type)q=
Searchmonth=
Monthpg=
Current results page (for pagination)filename=
File (ifs=file_download
)f=
Form (for passing content to Forms from the URL)status=
Web server status code for the resource being accessedp=
(legacy) Image extraction and navigation
There are many others used internally, and if you start donkeying with those, things will probably break. Examples are context
, feed
, skin
(front-end theme), page
, css
, and so on.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Txp human detection
No messy code for article title, so that avenue is out.
Egregiously, 418 I'm a Teapot
is missing from the Txp source. That should be added, along with an audio file of someone singing the song, “I’m a little teapot, short and stout.”
.htaccess
detects no cookie and sets302
code- Txp fires up
error_302
error_302
uses php to set cookie and refresh at top of source code- real browser refreshes URL, loading content
Except, when I RTFC for Txp 4.8.8, it looks like 30X
codes are redirected without first checking for error_xxx
files. It really should be doing that in the other order. No 20X
or 40X
codes seem appropriate.
Offline
Re: Txp human detection
skewray wrote #340559:
Shared, so I have almost no control. It is cpanel; all I can do is tinker with .htaccess.
I highly recommend running your DNS through cloudflare. They have powerful tools to stop aggressive bots and AI bots.
Online
#18 Today 06:48:36
Re: Txp human detection
I created a brilliant bit of php (mostly ripped from txp_die()
) and put it in human.php
:
<?php
setcookie( "bot", 1, 0, "/" ) ;
header("Refresh:1") ;
$out = <<<eod
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="robots" content="noindex">
<title>Checking if Human</title>
</head>
<body>
<h1>Checking if human</h1>
<p>This page should refresh by itself, if you don't do so first.</p>
</body>
</html>
eod ;
echo $out ;
?>
Then I added to .htaccess
:
RewriteCond %{HTTP_COOKIE} !"bot=(\d+)"
RewriteRule "^articles/.+" human.php [L,R=302]
This refreshes, but the cookie is never set, so it does an infinite loop. If I modify it to not return 302
,
RewriteCond %{HTTP_COOKIE} !"bot=(\d+)"
RewriteRule "^articles/.+" human.php [L]
Then the cookie is set, but the refresh does not happen. (A manual refresh loads the original Txp content.)
So, apparently the browser doesn’t load the cookie if the return code is a 302
, and it doesn’t do a refresh for a 200
. Maybe if I try (200+302)/2
, it will do both???
Last edited by skewray (Today 06:49:28)
Offline
#19 Today 08:23:05
Re: Txp human detection
Browsers should send cookies on 302s. At least, they used to. Maybe that’s been changed.
Out of curiosity, what happens if you set SameSite to Lax?
$cookie_options = array(
'expires' => 0,
'path' => '/',
'samesite' => 'Lax',
);
setcookie('bot', 1, $cookie_options);
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline