Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Apache: redirecting deep links
I have a list of publications, some of which have links to locally stored PDF files. Crawlers deep link to those files and serve them directly, which I would like to gently redirect. The publications URL sets a cookie (“bot”) to a number. So, my .htaccess file looks like
RewriteCond %{HTTP_COOKIE} !"bot=\d*"
RewriteRule "\.pdf$" publications
This doesn’t work. It’s supposed to serve up the publications Txp section static page. Even when the cookie is set, the PDF is returned. It returns the PDF even if I take out the cookie RewriteCond
. What am I doing dumb?
Offline
Re: Apache: redirecting deep links
Can you post a sample link the crawlers use?
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Apache: redirecting deep links
The standard txp .htaccess
file contains these rules:
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+) - [PT,L]
which mean that if a file/dir exists, it will be served directly, stopping (via [L]
flag) processing other rules. Might it be interfering with your block?
Offline
Re: Apache: redirecting deep links
Yiannis:
104.222.174.240 - - [09/Sep/2025:02:31:01 +0000] "GET /file_download/54/20_spie5523-6.pdf HTTP/1.1" 200 817527 "skeurae,," "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3"
23.226.219.216 - - [09/Sep/2025:09:25:53 +0000] "GET /file_download/44/09_spie3355.pdf HTTP/1.1" 200 392117 "skeurae,," "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 OPR/117.0.0.0"
49.206.132.15 - - [10/Sep/2025:04:13:59 +0000] "GET /file_download/44/09_spie3355.pdf HTTP/2" 200 392117 "skeurae,," "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
49.233.159.76 - - [11/Sep/2025:17:31:28 +0000] "GET /file_download/46/12_spie4008.pdf HTTP/2" 200 471571 "skeurae,," "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1; rv:110.0) Gecko/20100101 Firefox/110.0"
If the cookie had been set, then referer would have been skeurae,,V1
. I would normally have a lot more hits, but I currently reject all non-search-engine hits from Azure, Google, and Amazon clouds.
Oleg:
I have the unmodified Txp .htaccess code at the bottom of the .htaccess file, so those lines are there. Now that you point it out, though, I see that my .htaccess lines are modifyingREQUEST_URI
, while those lines are using REQUEST_FILENAME
. I bet this is the issue. (Do I add [N]
to the RewriteRule? I’ve never used [N]
, to avoid loops. Will [N]
recompute REQUEST_FILENAME
?)
Edit: The Apache documentation says that REQUEST_FILENAME
is rewritten when REQUEST_URI
is updated by mod_rewrite
. If this is true, then what I did should work. Of course, I have LightSpeed and not Apache, so it may be bugged.
Last edited by skewray (Today 17:33:05)
Offline
Re: Apache: redirecting deep links
So, I haven’t solved the .htaccess
mystery, but I did come up with something that works. Maybe better?
RewriteCond %{HTTP_COOKIE} !"bot=\d*"
RewriteRule "\.pdf$" publications [NC,L,R=303]
This forces a browser load of the publications
section static page as a “See Other” server response code.
Offline