Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2022-08-17 15:14:58

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Avoid single article URLs like /section/title

I think I have read a solution for that before but cannot happen to find it here.
My problem is that my website mainly works with sections and article lists.
Apart from the blog sections that of course has the /section/title scheme for the detailed pages of each blog article.

The website is pretty new and SEO is a crucial thing here.
I added the feed and atom tag from the default page template (but removed it just now). And also a sitemap.
The sitemap does show the sections plus the blog individual articles. So it does exactly what I want.

Now I see on the google search console that it just indexed a URL (let´s call it /fruits/apple) that is not from my blog section.
GSC says it got it from the feed. =(
As canonical it also has /fruits/apple. But my /fruit/ URL has the same content then my /fruits/apple. So that is not good.

What would be the best way to help google understand to only index (or find) only URLs of the type /blog/story-about-fruits.
And ideally also understand that /fruits/apple was a mistake (I redirected this one individually already to /fruits/)

I hope I could explain it properly. Any thoughts on that would help me a lot. Also for future projects.

Just not using the feed to not give google the URL in first place is not a solution as far as I know.
Because there can always be a way that google suddenly discovers this URL.

I guess some kind of redirect regex that spares one section (blog in my case) would do the trick. But is this the best way?

Offline

#2 2022-08-18 02:54:09

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,225
Website

Re: Avoid single article URLs like /section/title

Does this Feature focus: live pageless sections for hidden content eventually help? I have a couple sections on a site based on that and search engines never ever see the single article URI – afaict (!).


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#3 2022-08-18 05:31:56

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Avoid single article URLs like /section/title

Thanks phiw13.

I guess that would help. And I would take it into consideration when setting up the next textpattern site. But in my case it would mean, that I have to reconstruct the whole site. Because every section apart from /blog/ would need to get its own hidden section for its content. That would also mean that I at least need 2 sections for every menu point. One that “creates” the url and holds the page etc. another one that is hidden that would deliver the content.

I was aware of that feature before but never thought about the fact that it could help to hide the single article urls. I use it already when showing team members on an about page. All fine with that.

But to use it throughout the whole page would mean I have a whole bunch of sections more – basically it would double the number. For the client it could get confusing when there is an article that would belong to the section “fruits” but itself would be needed to be set to “fruits-content” or similiar.

Other ideas I could think of are:

  • a canonical to /section/ on every single-article-page apart from /blog/single-article
  • a noindex on every single-article-page apart from /blog/single-article
  • a htaccess regex redirect from /section/single-article to /section/ apart from /blog/single-article

This is all not solving the root of the problem. But maybe there is no other way?
Which would be best in your opinion?

Offline

#4 2022-08-18 07:53:06

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,225
Website

Re: Avoid single article URLs like /section/title

Ouch… I didn’t realise you are using those sections site-wide. That solution is not really appropriate for the situation you face I think – unless very carefully designed in advance.

One thing though – if the search engine issue is due to your sitemap, then see if you can construct it in a way that does not reference the individual article URL’s. IOW your sitemap only reference the /section/ page with the <lastmod /> date-time stamp. Not sure, is there a way to tell the robots: “revisit this every 4hours or whatever suits your publication schedule” ?

Else, the first two are probably use useful anyway,

  • a canonical to /section/ on every single-article-page apart from /blog/single-article
  • a noindex on every single-article-page apart from /blog/single-article
  • a htaccess regex redirect from /section/single-article to /section/ apart from /blog/single-article

hopefully someone else has additional suggestions.


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#5 2022-08-18 09:02:34

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Avoid single article URLs like /section/title

Ouch… I didn’t realise you are using those sections site-wide. That solution is not really appropriate for the situation you face I think – unless very carefully designed in advance.

OK. I am wondering if I am using the wrong construction all those years then.
Let´s say you have a website with home, service, about and contact. Then those would be sections, right? The content of those sections would come from articles assigned to those sections. (If the way we discussed earlier is not used.). And sometimes you would be able to click deaper into article level. Like in a blog for example. How would I construct it in a more appropriate way? I am super curious now.

One thing though – if the search engine issue is due to your sitemap…

My sitemap is fine. It is not showing those articles. I constructed it that way. The problem is the feeds are showing them. And you ll never know how else goolge would be able to detect that they actually exist. So I would love a solution, where they are just not there or at least it is waterproof that they are not taking in consideration by google….or… it is my construction that is faulty in that case. So I am curious how you would construct it differently.

Last edited by demoncleaner (2022-08-18 09:03:40)

Offline

#6 2022-08-18 09:21:35

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,225
Website

Re: Avoid single article URLs like /section/title

demoncleaner wrote #333797:

OK. I am wondering if I am using the wrong construction all those years then.
Let´s say you have a website with home, service, about and contact. Then those would be sections, right? The content of those sections would come from articles assigned to those sections. (If the way we discussed earlier is not used.).

No, there is nothing wrong with that concept. I was referring with the architecture as outlined in the article i linked too, in order to hide the individual article URL everywhere.

As for the feed, can you make the feed for one section only (/blog/)? See example 1 in the docs.


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#7 2022-08-18 10:55:40

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,743
Website

Re: Avoid single article URLs like /section/title

Even if it’s not precisely the “TXP-way”, I think your setup doesn’t sound unusual and have certainly seen lots of sites like that. There are often cases where people would rather split up the parts of a single URL into several sections, each with an own article that is not designed to be seen on its own. Or you have perhaps portfolio items, some of which have a profile page and some just a “card” on the overview page with no click-through to the article. In earlier txp versions we didn’t have ‘hidden sections’ to work with, so that’s how it was.

I agree with both of you on the possible solution.

  • tailor your canonical url to avoid showing the sub-pages on those sections. I think that’s the most important one.
  • restrict your feed to only those sections that require a per-article feed.
  • noindex is an additional safety option for pages that shouldn’t be logged, but really the canonical url should cover that.

I wouldn’t bother with the redirect to /section/ unless your page template doesn’t respond to both urls with the same output.

We don’t have a /section-only/ url scheme for the sections like we do for titles. As far as I remember /section/title always works regardless of what the url scheme is set to, so it may make no sense.

Another idea: you could theoretically construct your own rss-feed output along the same lines as the custom sitemap.xml output. It is a bit of extra work, though.


TXP Builders – finely-crafted code, design and txp

Offline

#8 2022-08-18 11:21:50

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,468
Website GitHub

Re: Avoid single article URLs like /section/title

Yeah, nothing is ideal for SEO in these cases. I have a few sites with a mix of blog-style most-recent-first sections and ‘corporate’ single page sections. I tend to get round that in a few ways.

My usual approach is two page templates. Most of the template content itself comes from Forms anyway (the head, the nav, the footer, etc) so it’s all shared. The only bits that differs are how the <txp:article> tag is constructed, with limit="1" and whether inter-article nav is shown. All my ‘single page’ sections then get the ‘single’ Page template, and all other sections get the ‘regular’ Page template, apart from the front page that gets ‘default’.

That doesn’t directly solve the issue of what happens if a search engine hits the /section/title page but it does mean I can more easily construct conditional tags around the page titles that don’t contain <txp:permlink> tags. The page_url tag has a page attribute which you can wrap in <txp:evaluate> or build a <txp:variable> that you set somewhere near the top of your page flow and can test further down to see which page template you’re serving. That gives you some great options to do conditional branching based on the name of the template in use, which is often more useful than testing the section name itself (as you could decide to move a section to use a different template in future, or make a new section and would have to hunt down your <txp:if_section> tags to modify them).

A more radical approach is to actually not have the content in articles at all for those sections. If it’s fairly static info and you’re comfy not giving access so your client can modify the content, then stick it in a Form and use some conditional logic like the above to fetch the Form with the same name as the section. Then search engines can’t find the articles at /section/title because there aren’t any! Bit of a hack, but it works.

Expanding on what jakob suggests, IIRC there’s also a callback for the feed tags that should allow you to tweak what happens. Maybe in there, you could apply similar logic to the above and only output feed info for certain sections or page templates? Not tried it.

Last edited by Bloke (2022-08-18 11:24:03)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#9 2022-08-18 11:32:24

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Avoid single article URLs like /section/title

Thanks a lot guys. I will look into all of your suggestion carefully. They help a lot.

Especial that:

The page_url tag has a page attribute which you can wrap in <txp:evaluate> or build a <txp:variable> that you set somewhere near the top of your page flow and can test further down to see which page template you’re serving.

sounds super interesting. Indeed I do not like using the use of <txp:if_section> too much for the given reasons. Espcially on multilingual pages and the use of multilingual sections I always needed a workaround for that. Seems I don´t need anymore.

Great stuff!

Offline

#10 2022-08-18 11:34:56

etc
Developer
Registered: 2010-11-11
Posts: 5,237
Website GitHub

Re: Avoid single article URLs like /section/title

Why not just send a 404 (via <txp:txp_die /> or <txp:header />) on individual article access, to prevent their discovery via /section/title or index.php?id=nn

Offline

#11 2022-08-18 11:46:07

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Avoid single article URLs like /section/title

Genius!

I just put this in my header-part of my template.

<txp:if_section name="blog" not> <txp:if_individual_article> <txp:txp_die /> </txp:if_individual_article> </txp:if_section>

But that throws a 503 error (Service unavailable).
I guess working with <txp:header/> is better. But I cannot figure out how the tag would send a 410 (thats permanently gone, so maybe better than 404). The docs say it has name and value attributes. I tried a lot but I have no clue what to put in there to get a 410.

Additionally I guess it would be good to adjust the feed.
Google does not like confusion. I would not want to offer google a URL in the feed and then give it a 410 on that.

Last edited by demoncleaner (2022-08-18 12:18:59)

Offline

#12 2022-08-18 12:50:14

etc
Developer
Registered: 2010-11-11
Posts: 5,237
Website GitHub

Re: Avoid single article URLs like /section/title

demoncleaner wrote #333803:

But that throws a 503 error (Service unavailable).

Yes, it’s the default status, but you can change it: <txp:txp_die status="404" />

Additionally I guess it would be good to adjust the feed.

Why wouldn’t you just exclude these sections from syndication?

Offline

Board footer

Powered by FluxBB