Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 Yesterday 14:59:39

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,949
Website GitHub

Re: Txp human detection

Is it too late in the chain to do anything in Txp itself? e.g. by (ab)using Pages named error_{statuscode} so they do some mapping skuldugguery for you? error_303, error_404, etc will get called every time Apache encounters the corresponding code and you could take action or do further redirects/processing/filtering as appropriate.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#14 Yesterday 16:31:50

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 255
Website Mastodon

Re: Txp human detection

That’s an interesting idea. My current .htaccess file always stops processing when setting a return code. I assume that I would set a status code in .htaccess, it drops through to Txp, and then Txp would run a error_xxx page immediately?

(By the way, I can find no documentation on messy URLs. The admin sets them, but nowhere can I find a description of all the possible variations that would lead to a given Txp product.)

Offline

#15 Yesterday 16:57:37

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,949
Website GitHub

Re: Txp human detection

skewray wrote #340604:

I assume that I would set a status code in .htaccess, it drops through to Txp, and then Txp would run a error_xxx page immediately?

That’s the general idea, yes. It allows you to display different stuff based on the server return code – or, at least, our cut-down set of codes we actually handle.

See txp:if_status and error_status (which you could use in an attribute of a <txp:evaluate> tag to test, if you prefer).

By the way, I can find no documentation on messy URLs.

It’s all messy syntax under the hood. The clean URLs are just prettier replacements. The main culprits are:

  • id= Asset ID (cross-content type)
  • s= Article Section
  • c= Category (cross-content type)
  • author= Author (cross-content type)
  • q= Search
  • month= Month
  • pg= Current results page (for pagination)
  • filename= File (if s=file_download)
  • f= Form (for passing content to Forms from the URL)
  • status= Web server status code for the resource being accessed
  • p= (legacy) Image extraction and navigation

There are many others used internally, and if you start donkeying with those, things will probably break. Examples are context, feed, skin (front-end theme), page, css, and so on.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#16 Yesterday 18:08:34

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 255
Website Mastodon

Re: Txp human detection

No messy code for article title, so that avenue is out.

Egregiously, 418 I'm a Teapot is missing from the Txp source. That should be added, along with an audio file of someone singing the song, “I’m a little teapot, short and stout.”

So,
  • .htaccess detects no cookie and sets 302 code
  • Txp fires up error_302
  • error_302 uses php to set cookie and refresh at top of source code
  • real browser refreshes URL, loading content

Except, when I RTFC for Txp 4.8.8, it looks like 30X codes are redirected without first checking for error_xxx files. It really should be doing that in the other order. No 20X or 40X codes seem appropriate.

Offline

#17 Yesterday 18:49:03

vistopher
Plugin Author
Registered: 2025-09-15
Posts: 9
Website GitHub

Re: Txp human detection

skewray wrote #340559:

Shared, so I have almost no control. It is cpanel; all I can do is tinker with .htaccess.

I highly recommend running your DNS through cloudflare. They have powerful tools to stop aggressive bots and AI bots.

Offline

#18 Today 06:48:36

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 255
Website Mastodon

Re: Txp human detection

I created a brilliant bit of php (mostly ripped from txp_die()) and put it in human.php:

<?php
setcookie( "bot", 1, 0, "/" ) ;
header("Refresh:1") ;
$out = <<<eod
<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="utf-8">
   <meta name="robots" content="noindex">
   <title>Checking if Human</title>
</head>
<body>
    <h1>Checking if human</h1>
    <p>This page should refresh by itself, if you don't do so first.</p>
</body>
</html>
eod ;
echo $out ;
?>

Then I added to .htaccess:

RewriteCond %{HTTP_COOKIE}     !"bot=(\d+)"
RewriteRule "^articles/.+"     human.php              [L,R=302]

This refreshes, but the cookie is never set, so it does an infinite loop. If I modify it to not return 302,

RewriteCond %{HTTP_COOKIE}    !"bot=(\d+)"
RewriteRule "^articles/.+"    human.php              [L]

Then the cookie is set, but the refresh does not happen. (A manual refresh loads the original Txp content.)

So, apparently the browser doesn’t load the cookie if the return code is a 302, and it doesn’t do a refresh for a 200. Maybe if I try (200+302)/2, it will do both???

Last edited by skewray (Today 06:49:28)

Offline

#19 Today 08:23:05

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,949
Website GitHub

Re: Txp human detection

Browsers should send cookies on 302s. At least, they used to. Maybe that’s been changed.

Out of curiosity, what happens if you set SameSite to Lax?

$cookie_options = array(
    'expires' => 0,
    'path' => '/', 
    'samesite' => 'Lax',
);
setcookie('bot', 1, $cookie_options);

The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#20 Today 16:08:44

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 255
Website Mastodon

Re: Txp human detection

Lax is supposed to be the default, and it was for me. I switched to your version anyway, in case a user has a creative browser.

I put in a 10-second refresh so I could check, and it turns out the cookie is being set. The browser is reloading the human.php version again. I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?), or if I have .htaccess arcana going on.

Offline

#21 Today 16:16:54

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,489
Bitbucket GitHub

Re: Txp human detection

With the risk of wrath, I’m gonna ask this question…have you considered Cloudflare (free tier) protection?

Online

#22 Today 16:20:18

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,260
Website GitHub Mastodon Twitter

Re: Txp human detection

skewray wrote #340616:

Lax is supposed to be the default, and it was for me. I switched to your version anyway, in case a user has a creative browser.

I put in a 10-second refresh so I could check, and it turns out the cookie is being set. The browser is reloading the human.php version again. I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?), or if I have .htaccess arcana going on.

Hi,
you can have a no-cache in your header. That normally works.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#23 Today 16:22:46

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 255
Website Mastodon

Re: Txp human detection

I would have to switch hosting services, which I don’t really want to do. My hosting service does have something similar. Second, I don’t think Cloudflare stops the bots in question.

For those with Cloudflare, does it stop bots coming from an ISP that do a single access and never come back? Can you check your logs?

Offline

#24 Today 16:32:23

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,949
Website GitHub

Re: Txp human detection

skewray wrote #340616:

I’m not sure if the browser is caching (it shouldn’t do that on refresh, right?)

Might be getting silly, but adding a cache-busting ?v=some-timestamp-value to your requests might break out of the loop.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

Board footer

Powered by FluxBB