Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Duplicate content
Thanks very much, Yiannis – now I understand what you mean. It took me a few reads to fully grok it (and that’s my brain, not your words), but that explains the situation.
Offline
Re: Duplicate content
gaekwad wrote #324199:
I will admit to being a bit baffled by this – no antagonistic intentions, just trying to understand.
If you want search engines to have non-duplicate content, totally understandable – that’s what
canonicalis for. I’m a bit fuzzy where the 503 / 404 errors are useful – does this mean anyone with a link to a page on your site that’s considered duplicate content get an error instead of the page content, or is intended as a search engine housekeeping exercise so any dupes will get rinsed out on the next run?
I’m always a bit fuzzy with coding, sorry. When I wrote earlier ?=* I meant also ?c=* and ?whatever so sorry to have caused you to be more verbose, but thanks because I learned something from it.
Re 503 / 404, both your reasons above are good ones, imho.
Re canonical to sort out duplicate content, yes fine, that’s what it’s for, but also this is an extra precaution and I think it’s cleaner to try and get rid of all duplicate content in the first place. G are always changing their algorithms and who know what hoops they’ll want us to jump through in the future?
Sorry the evaluate code doesn’t work for you, Yiannis, I’m using it on a simple site with just /title URL pattern. I have it in the head of both default and archive pages and it’s perfect for my needs.
Dozy P My attempt at music
Offline
Re: Duplicate content
zero wrote #324202:
When I wrote earlier
?=*I meant also?c=*and?whateverso sorry to have caused you to be more verbose, but thanks because I learned something from it.
Ah! Got it. Thanks for clarifying. Certainly no need to apologise, either.
Offline
Re: Duplicate content
Maybe I should bump this thread.
when using
<link rel="canonical" href="<txp:site_url trim="/" /><txp:page_url />" />
in the head of the document, it will understandably parse the tags whatever the url is which renders it semantically wrong.
ie. It will render
http(s)://domain.tld/section-name/ <--correct
http(s)://domain.tld/?s=section-name
http(s)://domain.tld/section-name/category1-name/ <--correct
http(s)://domain.tld/section-name/category1-name/category2-name/ <--correct
http(s)://domain.tld/?c=category1-name
http(s)://domain.tld/?c=category2-name
etc
So, what would the recommendation be in order to have the desired canonical url in there?
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Offline
Re: Duplicate content
That indeed appears to be working! Thanks so much Oleg.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Duplicate content
When I said Oleg’s code worked perfectly for me, I had not tested submitting a comment.
I’ve tried this:
<txp:variable name="foo" value='<txp:permlink />"?commented=0#txpCommentInputForm"' />
<txp:if_variable name="foo">
<txp:else />
<txp:evaluate query='"<txp:site_url trim="/" /><txp:page_url type="req" />" != "<txp:page_url context />"'>
<txp:txp_die status="404" />
</txp:evaluate>
</txp:if_variable>
This doesn’t go to the 404 but also the txp:evaluate stops working.
Is there a way to allow comment submissions but still trim all other ?whatever from the url?
Dozy P My attempt at music
Offline
Re: Duplicate content
zero wrote #324228:
Is there a way to allow comment submissions but still trim all other
?whateverfrom the url?
I would think that ?whatever type urls only appear in article lists and comments only usually appear in individual articles, so you could enclose the code in txp:if_article_list.
Last edited by colak (2020-07-03 09:47:05)
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Offline
Re: Duplicate content
zero wrote #324230:
?whatevercan appear anywhere. Try it at the end of any article url.
Indeed but why would anybody link to it?
Maybe if you define the canonical in the head, search engines will know
<txp:if_article_list>
<link rel="canonical" href="<txp:page_url context />" />
<txp:else />
<link rel="canonical" href="<txp:permlink />" />
</txp:if_article_list>
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Duplicate content
Yes, I know that Yiannis, but I’m stubbornly wanting to remove reliance on canonicals by using redirects and stripping away ?whatever instead.
Dozy P My attempt at music
Offline
Re: Duplicate content
zero wrote #324232:
Yes, I know that Yiannis, but I’m stubbornly wanting to remove reliance on canonicals by using redirects and stripping away
?whateverinstead.
I’m scratching my head but the cases may be too many. By default txp has ?q=, ?s=, ?c=, ?author=, ?id=… I hope I did not forget anything else expect to mention that there are more, generated by plugins.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Duplicate content
Thanks for trying, Yiannis, I must be a very trying person! Oleg’s code in the head of default and archive pages (which are all I am using in this case) removes all those you mention from the url. The only exception I have found is that when I use search (the normal search_input) it works and even shows example.org/?q=searchterm as the url.
I can see how some plugins might use ?c=, ?s=, ?id=, ?author= and probably other strings but if I’m not using those particular ones, it’s not a problem. I’m using categories but colour-coding them and not linking them, so someone can use search to find a category of interest. So at the moment I have no duplicate lists of excerpts or articles and nobody can produce one via a category list or author list. So Gargoyle won’t find a duplicate either, canonicalized or not.
The only problem I’d like to overcome is to get comments working as expected. With Oleg’s code in place, comments are previewed and submitted. However, on Submit the visitor is taken to a 404 and doesn’t know her comment has been submitted.
Dozy P My attempt at music
Offline
Re: Duplicate content
Oh, I must have forgotten about comments, sorry. Also, not sure about anchors. Could you try to replace <txp:page_url context /> with
<txp:page_url context="id, s, c, context, q, m, month, author, commented" />
and report back, please?
Offline
Re: Duplicate content
In txp versions prior to 4.8, I used to use the zem_redirect plugin which dealt with a lot of these issues. Unfortunately it no longer works for deeper structures. It may however work for you. It’s plug n play. Install it, enable it and all should be working as expected. Apologies for not thinking about it earlier.
> Edited to add that you may no longer need Oleg’s suggestion with this plugin which also protects from some script injections.
> Edit 2. Oleg was, as usual, faster with a native solution.
Last edited by colak (2020-07-03 16:04:11)
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline