Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#25 2020-05-23 14:54:15

hilaryaq
Plugin Author
Registered: 2006-08-20
Posts: 335
Website

Re: Duplicate Content due to section and article URL

I get you!

It would be enough in my opinion too, the other way to think of it is that if it is too late and pages have been indexed, Google will actually self correct to your preferred url over time as long as you submit the sitemap, remove any internal links or rss pointing to the hidden page, and consistently link to the section page from social media etc. If Google saw the pages as duplicate and preferred the article over section, that situation can be reversed if you set your own canonical correctly and do all of the above.. eventually that single article will fall away from results and Google will correctly index the section page instead. Especially when you reinforce it with your canonical and internal site structure.

A lot of websites and cms systems have multiple ways of pointing to a single url, whether it’s with or without a trailing slash, urls with parameters etc so Google is quite used to being flexible as long as you are consistant with what you link to and send traffic to, and most importantly canonicals.

Hope this helped.

You might get some value out of this page also: https://forum.textpattern.com/viewtopic.php?id=50705

A few methods there on outputting single pages and how to differentiate between them you might find handy :)


…………………
I <3 txp
…………………

Offline

#26 2020-05-23 17:23:08

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: Duplicate Content due to section and article URL

demoncleaner wrote #323195:

In case the meta url field of an article is filled the article exists as an URL in case it is empty, the URL (/section/title) just does not exist and the articles can be only used without any further worries of sitemap

Ha, no sadly it’s not that simple. But how about this as a workaround… in Txp 4.8.0 we officially support pageless sections. Go ahead and try it:

  1. Pick (or create) a section that is going to house content you don’t want anyone to be able to reach from a URL. I often make a section called “Snippets” for this purpose to store content that admins can edit but doesn’t need its own page. In your case, how about you call it “bios”?
  2. Edit that section and assign the ‘empty’ page at the top of the list to it. You can assign the empty stylesheet too if you like, though it’s not necessary as the empty page alone will trigger this behaviour.
  3. Save. All content in that section is now invisible! No page = no template = no output.

Thus you cannot access any of that content directly. Google will 404 if it tries; everybody will 404. Previously, storing content you didn’t want publicly accessible had to be in forms (or stored in other clever ways) which meant that regular editors like Staff Writer or Copy Editor couldn’t change it easily.

That’s great, you say: invisible content. So how do I get at it then? Easy: from any other section that does have a page template (e.g. in your case “Teams”) you can just pull whatever you need in via <txp:article_custom section="bios" />. If you categorise your content in that section, you can even choose to pull out stuff in certain categories, or certain IDs or articles that have certain custom fields set.

Thus your ‘landing page’ gets all the content in one big wodge, but as the individual articles have no physical URL, there is no way they can be indexed by anyone. And any author with access to the Write panel can alter the content in these ‘hidden’ pageless sections.

Does that help?

Last edited by Bloke (2020-05-23 17:28:18)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#27 2020-05-23 17:40:51

hilaryaq
Plugin Author
Registered: 2006-08-20
Posts: 335
Website

Re: Duplicate Content due to section and article URL

Stef that’s so handy for so many uses!!


…………………
I <3 txp
…………………

Offline

#28 2020-05-23 19:11:42

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Duplicate Content due to section and article URL

Indeed does that help! Thanks a lot Stef. I am quite often surprised about how little of the potential of textpattern I know.

That will help with my problem because I could create a section “team” (to be accessed) and on it I have all the articles I want from the section “bios” – which is an invisible section – for example. That makes a bit more sections for a typical website then it needed before but that would be totaly fine. And it seems to be the best solution for the orphaned articles to me.

Also in my multilanguage pages – which I lately create with a mixture of smd_query and adi_menu – that helps. Because here I typically use and article that contains all those single words or phrases of the website that are not big enough to have proprietary article and I attached it via article_custom on the top of each page I called it “language snippets” and put it on a section called the same. Because it is more convenient for the client only jumping into the articles and being able to edit the language snippets from there. I usually hide output forms from them. All too complicated.

With this new technique I do not have to worry about the created section. Because it is an invisible section.

Thanks a lot. Its brillant.

Offline

#29 2020-05-23 19:29:22

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: Duplicate Content due to section and article URL

demoncleaner wrote #323212:

in my multilanguage pages – which I lately create with a mixture of smd_query and adi_menu – that helps. Because here I typically use and article that contains all those single words or phrases of the website that are not big enough to have proprietary article and I attached it via article_custom on the top of each page I called it “language snippets” and put it on a section called the same.

In that case, you might be interested in Oleg’s novel approach to multi-lingual content in Txp 4.8. He’s documented it there but it’s basically content in pageless sections chained to articles in a visible section via a custom field to offer ‘translations’ of that content.

Until we nail true multi-lingual content in a future core version, this is the next best thing.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#30 2020-05-23 20:14:34

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Duplicate Content due to section and article URL

Cool thanks. I´ll have a closer look. I am definitely interested.
But from what I see now, I think the URLs are not “speaking” on Olegs version, right?

In my approach I use arc_meta (not adi_menu as written above) to make the field description on the sections obsolete. Then I use this field to have an internal common name for a section through all the languages. Then I create an output form that can search with the help of smd_query for every pendant in the other language of the current section. Works pretty well with not too much fiddeling until you have a decent setup. With the help of the css selectbox on each section I define its language. So that the section is aware of its own language. Like that you can easily go from /kontakt straight to /contact when clicking on “en” being on the german version etc. Could explain it more detailed if that is of any interest but maybe not in this thread because it is already quite off-topic. Sorry.

Offline

#31 2020-05-23 21:19:28

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,578
Website

Re: Duplicate Content due to section and article URL

An interesting thread that I’ve only just noticed. Not sure if I’ve absorbed everything, so apologies for duplication.

I agree, it’s certainly a good idea to avoid duplicate content, or at least not to routinely ignore it happening, but I’m not sure whether Google penalises the occasional occurrence of duplicate content – especially if discovered by some roundabout method – and then deliberately ignores a meta robots set to none and the specified preferred canonical url. I thought the canonical url was specifically to cover such situations, because “it can happen” that the same content is reachable via different paths. The penalisation thing is aimed at wanton misuse.

A more serious problem is, I think:

  • end points on a site that don’t make sense on their own, like team profiles without the context of a team page, a services snippet or a tumblr-like quote post not in its stream.
  • discovery of a site that’s not meant to be online yet.

I do something similar to what Hilary wrote:

  • make sure you don’t provide links (also not in sitemap + rss)
  • tailor canonical_url output to only the desired outcomes
  • meta robots none pages that should not be crawled
  • the redirect method also works as a brute-force method. You can use arc_redirect for individual page redirects (an updated version is on my GitHub page) or smd_redirect when using regex. Your header method works too. On older versions of txp you can do that with a txp:php snippet that does the same as txp:header location.
  • .htaccess redirections you can do for combinations of several (explicitly named) sections using (section-a|section-b|section-c) in your regex. You can also use RewriteCond to make a general RewriteRule for all but a few excluded urls (see how the HTML5 Boilerplate redirects an entire site to https with the exception of certain paths here). however, if you find yourself rewriting the majority of your sections, you might want to consider switching to a /Title-only url scheme from the get-go. In txp 4.8+ you can set that as default and then apply /section/title to only certain sections.

If you have a site with static pages and multiple article sections, you can also use txp:page_url type="page" to get the current page template in use and modify canonical url and output accordingly, e.g.

<txp:variable name="page_template"><txp:page_url type="page" /></txp:variable>
<txp:if_variable name="single">
    <!-- this is a section that uses a static single-page template -->
    <meta name="robots" content="noindex, nofollow">
    <link rel="canonical" href="<txp:site_url /><txp:section />/">
<txp:else />
    <!-- this is a regular section with articles -->
    <meta name="robots" content="index, follow">
    <link rel="canonical" href="<txp:permlink />">
</txp:if_variable>

That avoids you having to specify sections explicitly, which is good for themes.

Because that gets the page_template for the current url in the browser, that method doesn’t work for section_list loops, which you may find you need when creating a sitemap. I solved it this way:

<!-- inside a section_list loop -->
<txp:variable name="page_template"><txp:php>global $thissection; echo safe_field("page", 'txp_section', "name = '".$thissection['name']."'");</txp:php></txp:variable>
<txp:if_variable name="single" not>
    <!-- your article_custom for sections with multiple articles -->
</txp:if_variable>
…

… but it requires more database lookups (tolerable for a once-in-a-while sitemap lookup). If the page_template was part of the $thissection array, it would be simpler.

@Oleg, is that feasible?

What I also do to avoid a site being accidentally pre-discovered on a staging site before it’s ready to go live is to set a txp:variable at the beginning of the page called public_domain. This holds the desired intended location of the target site, i.e. the “live site”. Compare that against txp:site_url and if it doesn’t match (i.e. on your staging or demo site), set meta robots to none.

demoncleaner wrote #323214:

In my approach I use arc_meta (not adi_menu as written above) to make the field description on the sections obsolete. Then I use this field to have an internal common name for a section through all the languages. Then I create an output form that can search with the help of smd_query for every pendant in the other language of the current section. Works pretty well with not too much fiddeling until you have a decent setup. With the help of the css selectbox on each section I define its language. So that the section is aware of its own language. Like that you can easily go from /kontakt straight to /contact when clicking on “en” being on the german version etc. Could explain it more detailed if that is of any interest but maybe not in this thread because it is already quite off-topic. Sorry.

I’d love to hear how that works in detail (in a new thread ;-) I’ve tried various setups over the years depending on the site complexity and still haven’t settled on an optimal solution.


TXP Builders – finely-crafted code, design and txp

Offline

#32 2020-05-23 21:31:31

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: Duplicate Content due to section and article URL

jakob wrote #323215:

… but it requires more database lookups (tolerable for a once-in-a-while sitemap lookup). If the page_template was part of the $thissection array, it would be simpler.

@Oleg, is that feasible?

In 4.8 we have global $txp_sections:

array (
  'default' => 
  array (
    'name' => 'default',
    'skin' => 'future-imperfect',
    'page' => 'default',
    'css' => 'default',
    'description' => '',
    'in_rss' => '1',
    'on_frontpage' => '1',
    'searchable' => '1',
    'title' => 'Default',
    'permlink_mode' => '',
    'dev_skin' => 'future-imperfect',
    'dev_page' => '',
    'dev_css' => '',
  ),
  'articles' => 
  array (
    'name' => 'articles',
    'skin' => 'four-point-seven',
    'page' => 'default',
    'css' => 'default',
    'description' => 'Regular articles, baby.',
    'in_rss' => '1',
    'on_frontpage' => '1',
    'searchable' => '1',
    'title' => 'Articles',
    'permlink_mode' => '',
    'dev_skin' => 'four-point-seven',
    'dev_page' => 'default',
    'dev_css' => '',
  )
)

So $txp_sections[$thissection['name']]['page'] should work. But relying on txp internals is risky.

Offline

#33 2020-05-23 21:36:45

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: Duplicate Content due to section and article URL

Also,

<link rel="canonical" href="<txp:site_url /><txp:section />/">

should be doable as

<link rel="canonical" href='<txp:page_url context="section" />'>

Offline

#34 2020-05-24 06:42:46

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Duplicate Content due to section and article URL

I’d love to hear how that works in detail (in a new thread ;-) I’ve tried various setups over the years depending on the site complexity and still haven’t settled on an optimal solution.

you find my multilanguage approach here.

I am curious what you think.

Offline

#35 2020-05-25 06:48:41

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 220
Website

Re: Duplicate Content due to section and article URL

Bloke wrote #323202:

  1. Edit that section and assign the ‘empty’ page at the top of the list to it. You can assign the empty stylesheet too if you like, though it’s not necessary as the empty page alone will trigger this behaviour.

Just discovered that applying ‘empty’ pages in bulk from the sections menu seems not to work.
Might that be a bug?

In single it works fine and is a super nice feature.

Offline

#36 2020-05-25 08:00:49

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: Duplicate Content due to section and article URL

demoncleaner wrote #323233:

Just discovered that applying ‘empty’ pages in bulk from the sections menu seems not to work. Might that be a bug?

It’s designed like that. It means “leave the page (or stylesheet) as it is”. This allows you to change just one or the other without needing to do multiple multi-edit actions when you only want to update, say, a stylesheet on many sections and leave the various pages intact. The only way to set pageless is to do it from the Edit panel.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

Board footer

Powered by FluxBB