Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Txp human detection
Working solution, for posterity. It’s a variation of Stef’s suggestion.
RewriteRule "^(articles/.+)" $1?tt=%{TIME} [L,R=303,CO=bot:1:www.mysite.com]
I then had the issue that LiteSpeed (commercial Apache clone) adds a “.” in front of the domain, so I have to expire that cookie in php; otherwise, it has two cookies with the same name and gets all confused:
if ( isset( $_COOKIE["bot"] ) ) {
$counter = 1 + (int) $_COOKIE["bot"] ;
setcookie( "bot", "", time() - 3600, "/", ".www.mysite.com" ) ;
}
else
$counter = 1 ;
setcookie( "bot", (string) $counter, 0, "/" ) ;
Offline
Re: Txp human detection
Update: Since implementing the above, bot rejection is above 80%. Specifically, rejected were agents that pretended to be browsers, appeared with no previous cookie, and attempted to revisit without a new cookie.
I could improve that by confirming that the revisit is from the same IP address, as well as confirming that the revisit is within some small time window. Both of these are violated by crawler swarms. Implementing these checks in Apache would be a fun exercise that I might do on a rainy day.
Offline