Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 Yesterday 23:04:09

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 237
Website Mastodon

Apache: redirecting deep links

I have a list of publications, some of which have links to locally stored PDF files. Crawlers deep link to those files and serve them directly, which I would like to gently redirect. The publications URL sets a cookie (“bot”) to a number. So, my .htaccess file looks like

RewriteCond %{HTTP_COOKIE}              !"bot=\d*"
RewriteRule "\.pdf$"                    publications

This doesn’t work. It’s supposed to serve up the publications Txp section static page. Even when the cookie is set, the PDF is returned. It returns the PDF even if I take out the cookie RewriteCond. What am I doing dumb?

Offline

#2 Today 03:36:36

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,239
Website GitHub Mastodon Twitter

Re: Apache: redirecting deep links

Can you post a sample link the crawlers use?


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#3 Today 08:04:54

etc
Developer
Registered: 2010-11-11
Posts: 5,456
Website GitHub

Re: Apache: redirecting deep links

The standard txp .htaccess file contains these rules:

    RewriteCond %{REQUEST_FILENAME} -f [OR]
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteRule ^(.+) - [PT,L]

which mean that if a file/dir exists, it will be served directly, stopping (via [L] flag) processing other rules. Might it be interfering with your block?

Offline

#4 Today 15:03:36

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 237
Website Mastodon

Re: Apache: redirecting deep links

Yiannis:

104.222.174.240 - - [09/Sep/2025:02:31:01 +0000] "GET /file_download/54/20_spie5523-6.pdf HTTP/1.1" 200 817527 "skeurae,," "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3"
23.226.219.216 - - [09/Sep/2025:09:25:53 +0000] "GET /file_download/44/09_spie3355.pdf HTTP/1.1" 200 392117 "skeurae,," "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 OPR/117.0.0.0"
49.206.132.15 - - [10/Sep/2025:04:13:59 +0000] "GET /file_download/44/09_spie3355.pdf HTTP/2" 200 392117 "skeurae,," "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
49.233.159.76 - - [11/Sep/2025:17:31:28 +0000] "GET /file_download/46/12_spie4008.pdf HTTP/2" 200 471571 "skeurae,," "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1; rv:110.0) Gecko/20100101 Firefox/110.0"

If the cookie had been set, then referer would have been skeurae,,V1. I would normally have a lot more hits, but I currently reject all non-search-engine hits from Azure, Google, and Amazon clouds.

Oleg:

I have the unmodified Txp .htaccess code at the bottom of the .htaccess file, so those lines are there. Now that you point it out, though, I see that my .htaccess lines are modifying REQUEST_URI, while those lines are using REQUEST_FILENAME. I bet this is the issue. (Do I add [N] to the RewriteRule? I’ve never used [N], to avoid loops. Will [N] recompute REQUEST_FILENAME?)

Edit: The Apache documentation says that REQUEST_FILENAME is rewritten when REQUEST_URI is updated by mod_rewrite. If this is true, then what I did should work. Of course, I have LightSpeed and not Apache, so it may be bugged.

Last edited by skewray (Today 17:33:05)

Offline

Board footer

Powered by FluxBB