Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#37 2025-09-28 16:54:30

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 268
Website Mastodon

Re: Txp human detection

Working solution, for posterity. It’s a variation of Stef’s suggestion.

RewriteRule   "^(articles/.+)"   $1?tt=%{TIME}   [L,R=303,CO=bot:1:www.mysite.com]

I then had the issue that LiteSpeed (commercial Apache clone) adds a “.” in front of the domain, so I have to expire that cookie in php; otherwise, it has two cookies with the same name and gets all confused:

if ( isset( $_COOKIE["bot"] ) ) {
    $counter = 1 + (int) $_COOKIE["bot"] ;
    setcookie( "bot", "", time() - 3600, "/", ".www.mysite.com" ) ;
    }
else
    $counter = 1 ;
setcookie( "bot", (string) $counter, 0, "/" ) ;

Offline

#38 Yesterday 18:33:03

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 268
Website Mastodon

Re: Txp human detection

Update: Since implementing the above, bot rejection is above 80%. Specifically, rejected were agents that pretended to be browsers, appeared with no previous cookie, and attempted to revisit without a new cookie.

I could improve that by confirming that the revisit is from the same IP address, as well as confirming that the revisit is within some small time window. Both of these are violated by crawler swarms. Implementing these checks in Apache would be a fun exercise that I might do on a rainy day.

Offline

Board footer

Powered by FluxBB