Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 2020-07-02 13:16:56

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Duplicate content

etc wrote #324189:

Try <txp:txp_die status="404" />?

As you said, try “at your own risk.” It unfortunately does not work as expected desired on deeper url schemas (section/categories/article) and returns a 404 when landing on /section/cat1/cat2/ pages.

Last edited by colak (2020-07-02 13:18:11)


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#14 2020-07-02 13:21:27

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,134
GitHub

Re: Duplicate content

I will admit to being a bit baffled by this – no antagonistic intentions, just trying to understand.

If you want search engines to have non-duplicate content, totally understandable – that’s what canonical is for. I’m a bit fuzzy where the 503 / 404 errors are useful – does this mean anyone with a link to a page on your site that’s considered duplicate content get an error instead of the page content, or is intended as a search engine housekeeping exercise so any dupes will get rinsed out on the next run?

Last edited by gaekwad (2020-07-02 13:22:56)

Offline

#15 2020-07-02 14:40:47

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Duplicate content

gaekwad wrote #324199:

I will admit to being a bit baffled by this – no antagonistic intentions, just trying to understand.

If you want search engines to have non-duplicate content, totally understandable – that’s what canonical is for. I’m a bit fuzzy where the 503 / 404 errors are useful – does this mean anyone with a link to a page on your site that’s considered duplicate content get an error instead of the page content, or is intended as a search engine housekeeping exercise so any dupes will get rinsed out on the next run?

Hi Pete,

there is an issue with the way txp understands the desired canonical url.

For example when using

<link rel="canonical" href="<txp:site_url trim="/" /><txp:page_url />" />

in the head of the document, it will parse the tags whatever the url is which renders it semantically wrong. So, what would the recommendation be in order to have the desired url in there? Also, the breadcrumbs tag, returns urls that are again not the desired ones.

I agree with many people in this community regarding the issues plaguing the search engines but at the same time, we all desire traffic from beyond those we have sent an email or gave our card to. As such we are engaged in a Hegelian master–slave dialectic that we cannot escape from. ie. There is no master unless recognised by the slaves.

In my view, maybe because I vividly remember and believe in the 90s web, search engines are the most appropriate places to have our work discovered. The idea should not be to destroy Google but to apply pressure in order to get it fixed.

As such, and having personally accepted that my relationship with Google and other search engines is a relationship of connivance, I am trying, like many others, to play ball and serve our content in a way which would increase our visibility (not our SEO).


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#16 2020-07-02 14:48:09

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,134
GitHub

Re: Duplicate content

Thanks very much, Yiannis – now I understand what you mean. It took me a few reads to fully grok it (and that’s my brain, not your words), but that explains the situation.

Offline

#17 2020-07-02 14:51:52

zero
Member
From: Lancashire
Registered: 2004-04-19
Posts: 1,470
Website

Re: Duplicate content

gaekwad wrote #324199:

I will admit to being a bit baffled by this – no antagonistic intentions, just trying to understand.

If you want search engines to have non-duplicate content, totally understandable – that’s what canonical is for. I’m a bit fuzzy where the 503 / 404 errors are useful – does this mean anyone with a link to a page on your site that’s considered duplicate content get an error instead of the page content, or is intended as a search engine housekeeping exercise so any dupes will get rinsed out on the next run?

I’m always a bit fuzzy with coding, sorry. When I wrote earlier ?=* I meant also ?c=* and ?whatever so sorry to have caused you to be more verbose, but thanks because I learned something from it.

Re 503 / 404, both your reasons above are good ones, imho.

Re canonical to sort out duplicate content, yes fine, that’s what it’s for, but also this is an extra precaution and I think it’s cleaner to try and get rid of all duplicate content in the first place. G are always changing their algorithms and who know what hoops they’ll want us to jump through in the future?

Sorry the evaluate code doesn’t work for you, Yiannis, I’m using it on a simple site with just /title URL pattern. I have it in the head of both default and archive pages and it’s perfect for my needs.


BB6 Band My band
Gud One My blog

Offline

#18 2020-07-02 15:22:47

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,134
GitHub

Re: Duplicate content

zero wrote #324202:

When I wrote earlier ?=* I meant also ?c=* and ?whatever so sorry to have caused you to be more verbose, but thanks because I learned something from it.

Ah! Got it. Thanks for clarifying. Certainly no need to apologise, either.

Offline

#19 2020-07-03 06:06:43

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Duplicate content

Maybe I should bump this thread.

when using

<link rel="canonical" href="<txp:site_url trim="/" /><txp:page_url />" />

in the head of the document, it will understandably parse the tags whatever the url is which renders it semantically wrong.

ie. It will render


http(s)://domain.tld/section-name/ <--correct
http(s)://domain.tld/?s=section-name
http(s)://domain.tld/section-name/category1-name/  <--correct
http(s)://domain.tld/section-name/category1-name/category2-name/ <--correct
http(s)://domain.tld/?c=category1-name
http(s)://domain.tld/?c=category2-name
etc

So, what would the recommendation be in order to have the desired canonical url in there?


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#20 2020-07-03 06:17:19

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: Duplicate content

colak wrote #324221:

So, what would the recommendation be in order to have the desired canonical url in there?

Have you tried <link rel="canonical" href="<txp:page_url context />" />?

Offline

#21 2020-07-03 06:28:21

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Duplicate content

That indeed appears to be working! Thanks so much Oleg.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#22 2020-07-03 08:09:23

zero
Member
From: Lancashire
Registered: 2004-04-19
Posts: 1,470
Website

Re: Duplicate content

When I said Oleg’s code worked perfectly for me, I had not tested submitting a comment.

I’ve tried this:

<txp:variable name="foo" value='<txp:permlink />"?commented=0#txpCommentInputForm"' />
<txp:if_variable name="foo">
<txp:else />
<txp:evaluate query='"<txp:site_url trim="/" /><txp:page_url type="req" />" != "<txp:page_url context />"'>
    <txp:txp_die status="404" />
</txp:evaluate>
</txp:if_variable>

This doesn’t go to the 404 but also the txp:evaluate stops working.

Is there a way to allow comment submissions but still trim all other ?whatever from the url?


BB6 Band My band
Gud One My blog

Offline

#23 2020-07-03 09:42:03

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Duplicate content

zero wrote #324228:

Is there a way to allow comment submissions but still trim all other ?whatever from the url?

I would think that ?whatever type urls only appear in article lists and comments only usually appear in individual articles, so you could enclose the code in txp:if_article_list.

Last edited by colak (2020-07-03 09:47:05)


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#24 2020-07-03 09:47:53

zero
Member
From: Lancashire
Registered: 2004-04-19
Posts: 1,470
Website

Re: Duplicate content

colak wrote #324229:

I would think that ?whatever only appears in article lists, so you could enclose the code in txp:if_article_list

?whatever can appear anywhere. Try it at the end of any article url.


BB6 Band My band
Gud One My blog

Offline

Board footer

Powered by FluxBB