Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2025-06-02 00:48:13

phiw13
Plugin Author
From: South-Western Japan
Registered: 2004-02-27
Posts: 3,391
Website

Pageless section indexed by googlebot

Or at least, the/some articles in that pageless section have been found and the Gaggle search console is angry at me (it flags a 404 for those articles).

The 404 response is of course expected, manually trying that returns a “unknown” section with the 404 error code. I currently have no idea how that happened. As far as I can see the section never appears in a <txp:section_list />. The actual section page that displays those articles has no link to the individual articles (<txp:permlink /) not in the template(s) no in the output.

I tried adding a RedirectMatch 308 ^\/hidden-section-name\/(.*)$ https://domain.tld/real-section/ to the htaccess file (permanent redirect), but that does not work. The Textpattern URL handler takes over – sees an unknown section and issues a 404.

Question: how can I – for googlebot – send a redirect permanent instead (in theory my regex above does work). I’ve already added the pageless section name to the robots.txt file.

Edit: typo: Redirect ->@RedirectMatch@
(wrong copy pasting, it was correct in the htaccess)

Last edited by phiw13 (2025-06-03 00:59:42)


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
phiw13 on Codeberg

Offline

#2 2025-06-02 06:38:00

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,925
Website GitHub

Re: Pageless section indexed by googlebot

Two guesses:

  • If you have a sitemap that loops over all sections, maybe the pageless sections haven’t been excluded?
  • Maybe the rss/atom feed includes pageless sections if you don’t mark them as not to be syndicated?
    It’s possible that I’m not looking at the right function, but this function only seems to check against whether a section is in_rss.

TXP Builders – finely-crafted code, design and txp

Offline

#3 2025-06-02 06:39:02

etc
Developer
Registered: 2010-11-11
Posts: 5,393
Website GitHub

Re: Pageless section indexed by googlebot

phiw13 wrote #339759:

I tried adding a Redirect 308 ^\/hidden-section-name\/(.*)$ https://domain.tld/real-section/ to the htaccess file (permanent redirect), but that does not work. The Textpattern URL handler takes over – sees an unknown section and issues a 404.

That’s strange, htaccess should act before txp (and even php) is loaded.

Question: how can I – for googlebot – send a redirect permanent instead (in theory my regex above does work). I’ve already added the pageless section name to the robots.txt file.

You can try (in 4.9 at least)

<txp:if_section not name>
<!-- redirect with txp:header here -->
</txp:if_section>

or

<txp:if_article_section not name>
<!-- redirect with txp:header here -->
</txp:if_article_section>

in your error page template (untested).

Offline

#4 2025-06-02 08:33:14

phiw13
Plugin Author
From: South-Western Japan
Registered: 2004-02-27
Posts: 3,391
Website

Re: Pageless section indexed by googlebot

jakob wrote #339760:

Two guesses:

  • If you have a sitemap that loops over all sections, maybe the pageless sections haven’t been excluded?
  • Maybe the rss/atom feed includes pageless sections if you don’t mark them as not to be syndicated?

The pageless section(s) are not included in <txp:section_list /> or <txp:article_custom /> by default. For article tags, in a listing context they can appear, if listed explicitly, in my testing. The hidden pageless section is excluded from the feeds and frontpage on sections admin page. It is included in search but my search result template is build to use a custom URL (with custom article URL-title) for those articles.

<txp:if_article_section name="hidden_section">
    <h2><a href="<txp:site_url />real-section/#<txp:article_url_title />">About: <txp:title /></a></h2>
    <p><txp:search_result_excerpt hilight="mark" /></p>
<txp:else />
[…]
</txp:if_article_section>

PS – this is the type of pageless section I am talking about:https://textpattern.com/weblog/feature-focus-live-pageless-sections-for-hidden-content

etc wrote #339761:

That’s strange, htaccess should act before txp (and even php) is loaded.

[…]

I will doublecheck my htaccess snippet and try in those checks in the default error template. One of them will do, I am sure

Thank you both.

Last edited by phiw13 (2025-06-02 08:33:36)


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
phiw13 on Codeberg

Offline

#5 2025-06-03 01:13:21

phiw13
Plugin Author
From: South-Western Japan
Registered: 2004-02-27
Posts: 3,391
Website

Re: Pageless section indexed by googlebot

Update: I got the htaccess redirect to work smoothly once i corrected a typo in my section name (people-articles is not the same as people-article… the latter is what I needed).

I did not succeed in having the redirect to work in the default error template. Only a request for the pageless section name worked, not for ind. article. people-article redirect ok but people-article/name1 failed. I think I’d need to something with <txp:page_url type="request_uri" /> or similar, but I haven’t tried yet.

Still I have no idea how googlebot could find that hidden pageless section…

Last edited by phiw13 (2025-06-03 01:14:32)


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
phiw13 on Codeberg

Offline

Board footer

Powered by FluxBB