Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2020-05-18 20:45:37

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Duplicate Content due to section and article URL

Today I stumbled over something that I never really thought about but now that I see it I am afraid it´s a big issue on how I am structuring my pages with textpattern.

Let´s say I have a section called “Team”. The URL would be www.mydomain.com/team/
On this site I just have one article called “this-is-my-team”. Its section is “team” obviously. There could be more articles but for this example let´s say it´s just one.

That automatically creates the URL www.mydomain.com/team/this-is-my-team

So now I have two different URLs with the same content, right? Eventhough I would maybe take care that www.mydomain.com/team/this-is-my-team is not in my sitemap and is not linked to in my pages I just learned that google could still find it.

Am I missing something? Is there an easy way to avoid that? Am I building my pages in a wrong way?

Offline

#2 2020-05-19 00:19:12

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: Duplicate Content due to section and article URL

Nothing you’re doing wrong per se. It’s the nature of the beast with “static” sections. A few ways to approach this, singly or in tandem:

  1. Use the canonical link to choose one of your entries as the de facto link in your <head>.
  2. Make sure that if you have a static (one-page) template that it doesn’t permlink the heading. Normally the default form (or page) will add a permlink around the heading so that article list links can be followed. Just bypass that by either checking which page template is in use or, perhaps better, use a dedicated page/form for single-page sections.
  3. Use redirects. smd_redirect, for example, allows you to set up a 301 bounce from one to the other. Even using a regex if you like. So in the event someone unearths your individual article link, you can just redirect them to the landing page and show the article.

Hope that helps.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#3 2020-05-19 04:44:33

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Duplicate Content due to section and article URL

demoncleaner wrote #323031:

Today I stumbled over something that I never really thought about but now that I see it I am afraid it´s a big issue on how I am structuring my pages with textpattern.

Let´s say I have a section called “Team”. The URL would be www.mydomain.com/team/
On this site I just have one article called “this-is-my-team”. Its section is “team” obviously. There could be more articles but for this example let´s say it´s just one.

That automatically creates the URL www.mydomain.com/team/this-is-my-team

So now I have two different URLs with the same content, right? Eventhough I would maybe take care that www.mydomain.com/team/this-is-my-team is not in my sitemap and is not linked to in my pages I just learned that google could still find it.

Am I missing something? Is there an easy way to avoid that? Am I building my pages in a wrong way?

Hi, I had the same issue. There are ways to prevent search engines from listing both pages, and a way to omit the individual article from appearing in the search results.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#4 2020-05-19 05:57:59

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Duplicate Content due to section and article URL

First of all: thanks a lot you both for answering my question so quickly.
Second: I am really glad I was not understanding the whole principle wrong the last couple of years. =)

So tweaking the sitemap so that it will not show any URLs to article pages such as /section/title where I don´t want it, would be easy. Sometimes I am using rah_sitemap for example where this can be easily achieved. But this alone will not solve the problem.

Using canonicals can also help. I guess I would have to make myself some kind of dynamic thing that in general strips the /title from the current url in case I am on a single_article page. And create exceptions where it is needed. But again, I was told by a seo expert that using canonicals alone will not necessarily avoid that google might “see” and index the unwanted URL as well. Not sure how valid this statement is.

So I guess the only thing that really helps is the use of smd_redirect.
I was hoping that there is a more elegant way to avoid that single_article urls “work”. You would definitely have to work with regex to avoid that on a /news/ url every new news article needs a new redirect.

By the way:
I am not using the default page from txp. What I usually do on my pages is I include via php the header and the footer. Which are usually the same. In the middle goes the “core” template that can differ depending on what I need. Like that I am also able to edit everything separately in my editor and not directly in txp.

Last but not least, let me ask how a scenario would look like with non-static sections. What would be the way? Do you mean by using categories for example? It sounded like there would be another way instead of working with a section for each main menu point.

Last edited by demoncleaner (2020-05-19 05:59:05)

Offline

#5 2020-05-19 07:32:09

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Duplicate Content due to section and article URL

demoncleaner wrote #323034:

I am not using the default page from txp. What I usually do on my pages is I include via php the header and the footer. Which are usually the same. In the middle goes the “core” template that can differ depending on what I need. Like that I am also able to edit everything separately in my editor and not directly in txp.

You could of course achieve the same result by using output_form for the header and footer and as from the previous stable release you can work on your favourite editor to construct the themes.

Last but not least, let me ask how a scenario would look like with non-static sections. What would be the way? Do you mean by using categories for example? It sounded like there would be another way instead of working with a section for each main menu point.

It is up to you. If it is a blog type site, where you have linked titles and excerpts for example, you could use a meta-tag to prevent the listing of those pages <meta name="Robots" content="noindex,follow" />.

I normally do not like that, so I use

<meta name="Robots" content="index,follow" />
<meta name="revisit-after" content="10 days" />

Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#6 2020-05-19 08:40:35

Vienuolis
Member
From: Vilnius, Lithuania
Registered: 2009-06-14
Posts: 307
Website GitHub GitLab Twitter

Re: Duplicate Content due to section and article URL

In general, there should be two different forms for the same txp:article: one form="" for the single article webpage, and another listform="" for the corresponding item in an index on its section. Where only bibliographical description (with ID, date, title, keywords, and summary) should be sufficient. It is a bad (although very popular) practice to duplicate the same article contents on its section (and category, author, etc.), too.

Offline

#7 2020-05-19 09:19:11

Vienuolis
Member
From: Vilnius, Lithuania
Registered: 2009-06-14
Posts: 307
Website GitHub GitLab Twitter

Re: Duplicate Content due to section and article URL

For important and mostly static publications I usually write a sticky article with manually created table of contents, and then completely hide its section URL, replacing it with the same url-title of an article:

Match ^/mysection/$ Redirect /myarticle

(you could adopt the similar rewrite rule of Apache .htaccess or Stef’s plugin smd_redirect for Textpattern istead).

For an example, in az.on.lt only news reports are listed on a blog-like section — all other 6+ sections are hidden there in favour of manual TOCs.

Note: use this method where you are the only writer, it is also not suitable for serials.

Last edited by Vienuolis (2020-05-19 09:20:28)

Offline

#8 2020-05-19 09:34:54

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: Duplicate Content due to section and article URL

demoncleaner wrote #323034:

tweaking the sitemap so that it will not show any URLs to article pages such as /section/title where I don´t want it, would be easy.

This becomes even more flexible now you don’t need a plugin for it. Easy to do in core with a form.

I guess I would have to make myself some kind of dynamic thing that in general strips the /title from the current url in case I am on a single_article page.

You may not have to go that far. It’s easier to prevent those links from appearing in your site structure than it is to manipulate the URL after the fact.

I was told by a seo expert that using canonicals alone will not necessarily avoid that google might “see” and index the unwanted URL as well.

Then they shouldn’t! I suspect this info is bogus or a misconfiguration. Google advocate using canonicals.

I was hoping that there is a more elegant way to avoid that single_article urls “work”.

There is! I use a dedicated page template for single-article URLs. As you say, by “including” header and footer via <txp:output_form> and farming content out to forms wherever possible for the meat of the page, the majority of the page template will be the same, which reduces maintenance long term.

The beauty of this approach is:

  • your template can not only simpler than the one for the front page (as it doesn’t have to handle author and category landing pages, and search results)
  • you can use a different form for your articles, which deliberately doesn’t add permlinks to headings
  • you can spit out the article body and excerpt on landing pages. No individual article permlinks, nothing to index, no duplicate content
  • you can reuse it for many sections that require single static article content to be shown.

to avoid that on a /news/ url every new news article needs a new redirect.

Not sure I follow. Why would you not want search engines to spider every article in a /news section? Surely that’s the whole point!

I am also able to edit everything separately in my editor and not directly in txp.

As colak says, from 4.7+ you can adopt that workflow and either import the changes into Txp as your current or development theme, or use etc_flat to automatically update the database as you work on the files.

Last but not least, let me ask how a scenario would look like with non-static sections.

This works in the usual way. A landing page with a list of articles that you may choose to allow robots to index or not. And each article in the list links to its individual article that has a canonical URL specifying it is the de facto URL for that content.

If you look at our default page templates you’ll see the approach Phil has taken to show/hide various content from search engines based on its type. This is best practice.

Hope that helps.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#9 2020-05-20 14:48:54

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Duplicate Content due to section and article URL

Thanks for taking all that effort to answer my question.
I feel that here is still some kind of misunderstand. Maybe due to my english. Sorry.

No plugin needed for a more manual sitemap: Got it!

SEO “experts” opinion on canonicals: Check (just forget about this bogus!)

Stripping /title from the url:
Maybe not the best idea but can be achieved through .htaccess by

RewriteRule ^(mysection/). /$1 [R,L]

But I only found a solution that would strip and redirect it for each particular section. So not very elegant.

Something like:

RewriteRule (\/[a-z\-]+\/).+ $1 [R,L]

Of course would not work, because you would redirect the assets folders as well.

single_articles pages: Yes, I also use a dedicated page for this type. So I know about the advantages.

Working on a seperate editor of my choice: Thats just fine how it is for me. It is not really topic related.

So let me put all this aside and come back to the main problem I have.
I tryx to give another example:

Lets say I have a section called “news”.
On it I have 10 news entries.
Their titles are “title1”, “title2”, “title3” etc.

So what happens is that each article can be reached either through the URL mydomain.de/news/ (alongside with the other entries) or through mydomain.de/news/title1 etc.

So now I do not want to use excerpts and work with overview and detail pages. It is just one news site with 10 entries but each entry is an article and has the section “news”.

I am not using the default page template but if I see that correctly the mydomain.de/news/title1 will not get a canoncial with it. By default it will also not get a robots “noindex,follow”. I might be able to exclude mydomain.de/news/title1 from the sitemap and there might be no permlink anywhere so human users will never see mydomain.de/news/title1.
But I was told (again by an expert) that under some conditions searchengines might still find mydomain.de/news/title1 which would be bad. Would that also be some kind of bogus? Or am I missing something somewhere else? Is my aproach to what I want to achieve not a good one.

So, my solution so far would be either to have htaccess redirects like stated above or have those articles like “title1” have a “noindex,follow”.
Which both is not a good solution. Because redirects are not elegant and fail when someone creates a new section. And to put “noindex,follow” on each new article can be easily forgotten.

I hope I could explain it better now.

Offline

#10 2020-05-20 15:14:19

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: Duplicate Content due to section and article URL

demoncleaner wrote #323058:

Stripping /title from the url:
Maybe not the best idea but can be achieved through .htaccess by

RewriteRule ^(mysection/). /$1 [R,L]

But I only found a solution that would strip and redirect it for each particular section. So not very elegant.

You can try

<txp:if_individual_article>
    <txp:header name value="301" />
    <txp:header name="Location" value='<txp:section url />' />
</txp:if_individual_article>

in the ‘individual’ part of you article form. This will give a 302 instead of 301, but should be tweakable.

There actually are other solutions: sticky, themeless articles, other headers, but I’m not a seo expert.

Offline

#11 2020-05-20 15:23:54

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Duplicate Content due to section and article URL

So now I do not want to use excerpts and work with overview and detail pages. It is just one news site with 10 entries but each entry is an article and has the section “news”.

This will basically mean that no article will actually be able to be referenced anywhere as it might reside on the 1st page of your news section today, the second page tomorrow, and so on.

Maybe you can tell us slightly more about the project. Are you doing it for yourself or a client? ie will you be the webmaster or will it be managed by somebody else you will not have any control over? Also please tell us more regarding how you envisage the structure.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#12 2020-05-20 15:33:33

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Duplicate Content due to section and article URL

Thank you so much etc! This was exactly what I needed. Like that I can redirect all sections in which I do not want single_articles to be accessable and other sections can have that feature if needed. Perfect!

@Yannis: I am aware about the character of a news section. And that it will usually have a pgination ones it has more entries. I was just using it as an example. Maybe a bad one, sorry.

It is also not about one particular project. It is more about all the projects I have done in the past. Becuase I was not really aware that this could be an issue.

Let me give you another example of how typical site that I am usually building would look like.

Let´s say it has the menu structure:

Home | News | Service | Team | Contact

On news – as you said – you would want to use excerpts and single article pages as you said.
On Team you might have 12 team members. Each with a little text but they do not have any permlink or any further information to link to. So what you want is that the URL /team/donald_duck is not been seen or indexed or creating duplicate content, right?

This is what it was all about. And I was asking myself if my idea of structuring it in textpattern is not ideal or my understanding on how the search engines would react on that is wrong or if textpattern maybe lacks an easy way on dealing with this issue.

But for me etc gave the perfect workaround now. Thanks again.

Offline

Board footer

Powered by FluxBB