Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 Yesterday 14:59:39

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,948
Website GitHub

Re: Txp human detection

Is it too late in the chain to do anything in Txp itself? e.g. by (ab)using Pages named error_{statuscode} so they do some mapping skuldugguery for you? error_303, error_404, etc will get called every time Apache encounters the corresponding code and you could take action or do further redirects/processing/filtering as appropriate.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#14 Yesterday 16:31:50

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 253
Website Mastodon

Re: Txp human detection

That’s an interesting idea. My current .htaccess file always stops processing when setting a return code. I assume that I would set a status code in .htaccess, it drops through to Txp, and then Txp would run a error_xxx page immediately?

(By the way, I can find no documentation on messy URLs. The admin sets them, but nowhere can I find a description of all the possible variations that would lead to a given Txp product.)

Offline

#15 Yesterday 16:57:37

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,948
Website GitHub

Re: Txp human detection

skewray wrote #340604:

I assume that I would set a status code in .htaccess, it drops through to Txp, and then Txp would run a error_xxx page immediately?

That’s the general idea, yes. It allows you to display different stuff based on the server return code – or, at least, our cut-down set of codes we actually handle.

See txp:if_status and error_status (which you could use in an attribute of a <txp:evaluate> tag to test, if you prefer).

By the way, I can find no documentation on messy URLs.

It’s all messy syntax under the hood. The clean URLs are just prettier replacements. The main culprits are:

  • id= Asset ID (cross-content type)
  • s= Article Section
  • c= Category (cross-content type)
  • author= Author (cross-content type)
  • q= Search
  • month= Month
  • pg= Current results page (for pagination)
  • filename= File (if s=file_download)
  • f= Form (for passing content to Forms from the URL)
  • status= Web server status code for the resource being accessed
  • p= (legacy) Image extraction and navigation

There are many others used internally, and if you start donkeying with those, things will probably break. Examples are context, feed, skin (front-end theme), page, css, and so on.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#16 Yesterday 18:08:34

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 253
Website Mastodon

Re: Txp human detection

No messy code for article title, so that avenue is out.

Egregiously, 418 I'm a Teapot is missing from the Txp source. That should be added, along with an audio file of someone singing the song, “I’m a little teapot, short and stout.”

So,
  • .htaccess detects no cookie and sets 302 code
  • Txp fires up error_302
  • error_302 uses php to set cookie and refresh at top of source code
  • real browser refreshes URL, loading content

Except, when I RTFC for Txp 4.8.8, it looks like 30X codes are redirected without first checking for error_xxx files. It really should be doing that in the other order. No 20X or 40X codes seem appropriate.

Offline

#17 Yesterday 18:49:03

vistopher
Plugin Author
Registered: 2025-09-15
Posts: 8
Website GitHub

Re: Txp human detection

skewray wrote #340559:

Shared, so I have almost no control. It is cpanel; all I can do is tinker with .htaccess.

I highly recommend running your DNS through cloudflare. They have powerful tools to stop aggressive bots and AI bots.

Online

#18 Today 06:48:36

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 253
Website Mastodon

Re: Txp human detection

I created a brilliant bit of php (mostly ripped from txp_die()) and put it in human.php:

<?php
setcookie( "bot", 1, 0, "/" ) ;
header("Refresh:1") ;
$out = <<<eod
<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="utf-8">
   <meta name="robots" content="noindex">
   <title>Checking if Human</title>
</head>
<body>
    <h1>Checking if human</h1>
    <p>This page should refresh by itself, if you don't do so first.</p>
</body>
</html>
eod ;
echo $out ;
?>

Then I added to .htaccess:

RewriteCond %{HTTP_COOKIE}     !"bot=(\d+)"
RewriteRule "^articles/.+"     human.php              [L,R=302]

This refreshes, but the cookie is never set, so it does an infinite loop. If I modify it to not return 302,

RewriteCond %{HTTP_COOKIE}    !"bot=(\d+)"
RewriteRule "^articles/.+"    human.php              [L]

Then the cookie is set, but the refresh does not happen. (A manual refresh loads the original Txp content.)

So, apparently the browser doesn’t load the cookie if the return code is a 302, and it doesn’t do a refresh for a 200. Maybe if I try (200+302)/2, it will do both???

Last edited by skewray (Today 06:49:28)

Offline

#19 Today 08:23:05

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,948
Website GitHub

Re: Txp human detection

Browsers should send cookies on 302s. At least, they used to. Maybe that’s been changed.

Out of curiosity, what happens if you set SameSite to Lax?

$cookie_options = array(
    'expires' => 0,
    'path' => '/', 
    'samesite' => 'Lax',
);
setcookie('bot', 1, $cookie_options);

The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

Board footer

Powered by FluxBB