Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#31 2020-05-23 21:19:28

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 3,807
Website

Re: Duplicate Content due to section and article URL

An interesting thread that I’ve only just noticed. Not sure if I’ve absorbed everything, so apologies for duplication.

I agree, it’s certainly a good idea to avoid duplicate content, or at least not to routinely ignore it happening, but I’m not sure whether Google penalises the occasional occurrence of duplicate content – especially if discovered by some roundabout method – and then deliberately ignores a meta robots set to none and the specified preferred canonical url. I thought the canonical url was specifically to cover such situations, because “it can happen” that the same content is reachable via different paths. The penalisation thing is aimed at wanton misuse.

A more serious problem is, I think:

  • end points on a site that don’t make sense on their own, like team profiles without the context of a team page, a services snippet or a tumblr-like quote post not in its stream.
  • discovery of a site that’s not meant to be online yet.

I do something similar to what Hilary wrote:

  • make sure you don’t provide links (also not in sitemap + rss)
  • tailor canonical_url output to only the desired outcomes
  • meta robots none pages that should not be crawled
  • the redirect method also works as a brute-force method. You can use arc_redirect for individual page redirects (an updated version is on my GitHub page) or smd_redirect when using regex. Your header method works too. On older versions of txp you can do that with a txp:php snippet that does the same as txp:header location.
  • .htaccess redirections you can do for combinations of several (explicitly named) sections using (section-a|section-b|section-c) in your regex. You can also use RewriteCond to make a general RewriteRule for all but a few excluded urls (see how the HTML5 Boilerplate redirects an entire site to https with the exception of certain paths here). however, if you find yourself rewriting the majority of your sections, you might want to consider switching to a /Title-only url scheme from the get-go. In txp 4.8+ you can set that as default and then apply /section/title to only certain sections.

If you have a site with static pages and multiple article sections, you can also use txp:page_url type="page" to get the current page template in use and modify canonical url and output accordingly, e.g.

<txp:variable name="page_template"><txp:page_url type="page" /></txp:variable>
<txp:if_variable name="single">
    <!-- this is a section that uses a static single-page template -->
    <meta name="robots" content="noindex, nofollow">
    <link rel="canonical" href="<txp:site_url /><txp:section />/">
<txp:else />
    <!-- this is a regular section with articles -->
    <meta name="robots" content="index, follow">
    <link rel="canonical" href="<txp:permlink />">
</txp:if_variable>

That avoids you having to specify sections explicitly, which is good for themes.

Because that gets the page_template for the current url in the browser, that method doesn’t work for section_list loops, which you may find you need when creating a sitemap. I solved it this way:

<!-- inside a section_list loop -->
<txp:variable name="page_template"><txp:php>global $thissection; echo safe_field("page", 'txp_section', "name = '".$thissection['name']."'");</txp:php></txp:variable>
<txp:if_variable name="single" not>
    <!-- your article_custom for sections with multiple articles -->
</txp:if_variable>
…

… but it requires more database lookups (tolerable for a once-in-a-while sitemap lookup). If the page_template was part of the $thissection array, it would be simpler.

@Oleg, is that feasible?

What I also do to avoid a site being accidentally pre-discovered on a staging site before it’s ready to go live is to set a txp:variable at the beginning of the page called public_domain. This holds the desired intended location of the target site, i.e. the “live site”. Compare that against txp:site_url and if it doesn’t match (i.e. on your staging or demo site), set meta robots to none.

demoncleaner wrote #323214:

In my approach I use arc_meta (not adi_menu as written above) to make the field description on the sections obsolete. Then I use this field to have an internal common name for a section through all the languages. Then I create an output form that can search with the help of smd_query for every pendant in the other language of the current section. Works pretty well with not too much fiddeling until you have a decent setup. With the help of the css selectbox on each section I define its language. So that the section is aware of its own language. Like that you can easily go from /kontakt straight to /contact when clicking on “en” being on the german version etc. Could explain it more detailed if that is of any interest but maybe not in this thread because it is already quite off-topic. Sorry.

I’d love to hear how that works in detail (in a new thread ;-) I’ve tried various setups over the years depending on the site complexity and still haven’t settled on an optimal solution.


TXP Builders – finely-crafted code, design and txp

Offline

#32 2020-05-23 21:31:31

etc
Developer
Registered: 2010-11-11
Posts: 3,650
Website

Re: Duplicate Content due to section and article URL

jakob wrote #323215:

… but it requires more database lookups (tolerable for a once-in-a-while sitemap lookup). If the page_template was part of the $thissection array, it would be simpler.

@Oleg, is that feasible?

In 4.8 we have global $txp_sections:

array (
  'default' => 
  array (
    'name' => 'default',
    'skin' => 'future-imperfect',
    'page' => 'default',
    'css' => 'default',
    'description' => '',
    'in_rss' => '1',
    'on_frontpage' => '1',
    'searchable' => '1',
    'title' => 'Default',
    'permlink_mode' => '',
    'dev_skin' => 'future-imperfect',
    'dev_page' => '',
    'dev_css' => '',
  ),
  'articles' => 
  array (
    'name' => 'articles',
    'skin' => 'four-point-seven',
    'page' => 'default',
    'css' => 'default',
    'description' => 'Regular articles, baby.',
    'in_rss' => '1',
    'on_frontpage' => '1',
    'searchable' => '1',
    'title' => 'Articles',
    'permlink_mode' => '',
    'dev_skin' => 'four-point-seven',
    'dev_page' => 'default',
    'dev_css' => '',
  )
)

So $txp_sections[$thissection['name']]['page'] should work. But relying on txp internals is risky.

Offline

#33 2020-05-23 21:36:45

etc
Developer
Registered: 2010-11-11
Posts: 3,650
Website

Re: Duplicate Content due to section and article URL

Also,

<link rel="canonical" href="<txp:site_url /><txp:section />/">

should be doable as

<link rel="canonical" href='<txp:page_url context="section" />'>

Offline

#34 2020-05-24 06:42:46

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 104
Website

Re: Duplicate Content due to section and article URL

I’d love to hear how that works in detail (in a new thread ;-) I’ve tried various setups over the years depending on the site complexity and still haven’t settled on an optimal solution.

you find my multilanguage approach here.

I am curious what you think.

Offline

#35 2020-05-25 06:48:41

demoncleaner
Plugin Author
From: Germany
Registered: 2008-06-29
Posts: 104
Website

Re: Duplicate Content due to section and article URL

Bloke wrote #323202:

  1. Edit that section and assign the ‘empty’ page at the top of the list to it. You can assign the empty stylesheet too if you like, though it’s not necessary as the empty page alone will trigger this behaviour.

Just discovered that applying ‘empty’ pages in bulk from the sections menu seems not to work.
Might that be a bug?

In single it works fine and is a super nice feature.

Offline

#36 2020-05-25 08:00:49

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 9,282
Website

Re: Duplicate Content due to section and article URL

demoncleaner wrote #323233:

Just discovered that applying ‘empty’ pages in bulk from the sections menu seems not to work. Might that be a bug?

It’s designed like that. It means “leave the page (or stylesheet) as it is”. This allows you to change just one or the other without needing to do multiple multi-edit actions when you only want to update, say, a stylesheet on many sections and leave the various pages intact. The only way to set pageless is to do it from the Edit panel.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#37 2020-05-25 11:02:48

etc
Developer
Registered: 2010-11-11
Posts: 3,650
Website

Re: Duplicate Content due to section and article URL

We could probably add None option to the multi-edit widget?

Offline

#38 2020-05-25 12:16:49

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 9,282
Website

Re: Duplicate Content due to section and article URL

etc wrote #323235:

We could probably add None option to the multi-edit widget?

That would work. As long as the dropdowns are consistent between “None” (pageless) and “leave alone” on list and edit steps.

If we just label the value="" as None on the edit step, how do we handle leave_blank on the section list step? They’d both have value="" right?


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#39 2020-05-25 12:22:12

etc
Developer
Registered: 2010-11-11
Posts: 3,650
Website

Re: Duplicate Content due to section and article URL

Bloke wrote #323239:

If we just label the value="" as None on the edit step, how do we handle leave_blank on the section list step? They’d both have value="" right?

We could give another value, e.g. * to leave it alone?

Offline

#40 2020-05-25 12:24:07

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 9,282
Website

Re: Duplicate Content due to section and article URL

etc wrote #323246:

We could give another value, e.g. * to leave it alone?

That’d work. This could sneak into 4.8.1 as it’s not too invasive and is an enhancement/bug fix to the theme changes. If you have time :)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

Board footer

Powered by FluxBB