Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2019-06-06 06:23:47

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 7,271
Website

Sitemaps

I am creating a new thread for this as it maybe used by others. A few months ago, Julian came up with this solution to create sitemaps which works wonderfully. Is there a way though to add pages to the code? ie http://www.site.tld/section/?pg=2 … etc?

On the other hand is this needed?


Yiannis
——————————
neme.org | hblack.net | LABS | State Machines | Respbublika! | NeMe @ github

Offline

#2 2019-06-06 08:05:46

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 3,453
Website

Re: Sitemaps

colak wrote #318381:

Is there a way though to add pages to the code? ie http://www.site.tld/section/?pg=2 … etc?

Not something I’ve tried in sitemap.xml but an interesting idea.

I don’t know how much actual SEO benefit this will bring when each of your articles on sub-pages has their own URL. It would make sense on sections with running lists of posts that don’t have an own URL, e.g. like in a forum, or a tumblr-style log where posts are short and don’t link deeper.

It’s probably easiest if your pagination is predictable. If your users can change the number of items per page or sort in different directions or with filters, you’ll need to cater for each and every filter situation. By way of example: Google’s search result for this forum are based on a different number of posts per page as my personal settings, causing them to be consistently wrong for me.

If you do have static pagination settings, you can use the following to determine the number of pages:

<txp:article_custom section="your-section" pageby="10" pgonly />

where the value for pageby should match the one you are using on your section for limit. See the article_custom docs

In combination with a txp:variable and something like rah_repeat you could output the sitemap links as follows, e.g.

<txp:variable name="section_pages"><txp:article_custom section="your-section" pageby="10" pgonly /></txp:variable>
<txp:if_variable name="section_pages" value="1" not>
	 <txp:rah_repeat range='2, <txp:variable name="section_pages" />'>
		<loc><txp:section url="1" />?pg=<txp:rah_repeat_value /></loc>
	 </txp:rah_repeat>
</txp:if_variable>

I started range with 2 rather than 1 as you don’t need ?pg=1 because it’s the same as the section landing page. The if_variable skips it if only one page exists.

On the other hand is this needed?

This SEO Guide to URL Parameter Handling discusses different approaches and seems to suggest it is sufficient to mark up your pages for clear crawling as follows:

<link rel="prev" href="https://www.example.com/category?page=2">
<link rel="next" href="https://www.example.com/category?page=4">
<link rel="canonical" href="https://www.example.com/category?page=3">

where the canonical url is the current page in the paginated view.

It also discusses using the URL Parameter Tool in Google Search Console to identify the pagination url parameter. At the end the author compares the different methods and suggests using the above without including paginated pages in sitemap.xml.


TXP Builders – finely-crafted code, design and txp

Offline

#3 2019-06-06 09:01:53

philwareham
Core designer
From: Farnham, Surrey, UK
Registered: 2009-06-11
Posts: 3,196
Website

Re: Sitemaps

For best SEO – to avoid duplicate content and/or content changes over time – I ensure that in article lists only <meta name="robots" content="index, follow"> the first page of article lists, all subsequent list pages are set to <meta name="robots" content="noindex, follow">.

Individual articles of course are always <meta name="robots" content="index, follow">.

See <head> example below:

<head>
    ... global head content ...
    <txp:if_article_list>
        ... article list-specific head content ...
        <txp:article pgonly limit="12" />
        <txp:variable name="page" value='<txp:page_url type="pg" />' />
        <txp:if_variable name="page" value="1">
            <meta name="robots" content="index, follow">
            <link rel="canonical" href="<txp:section url />">
            ... other article-list specific head content such as JSON-LD, Open Graph and Twitter Cards ...
        <txp:else />
            <meta name="robots" content="noindex, follow">
        </txp:if_variable>
        <txp:evaluate test>
            <link rel="prev" href="<txp:newer />">
        </txp:evaluate>
        <txp:evaluate test>
            <link rel="next" href="<txp:older />">
        </txp:evaluate>
    <txp:else />
        ... individual article-specific head content ...
        <meta name="robots" content="<txp:if_expires>unavailable_after: <txp:expires gmt format="%d-%b-%y %T" /> GMT<txp:else />index, follow</txp:if_expires>">
        <link rel="canonical" href="<txp:permlink />"
        ... other individual article-specific head content such as JSON-LD, Open Graph and Twitter Cards ...
    </txp:if_article_list>
</head>

For category list pages I set all these to <meta name="robots" content="noindex, follow"> because they are just duplicating content that is already in article lists (although filtered down to a specific category).

Then in your sitemap.xml page template you won’t need to include any category pages at all, and just the overall section (page 1) URL, something like so:

<?xml version="1.0" encoding="UTF-8"?><txp:header value="application/xml; charset=utf-8" />
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
    <loc><txp:site_url /></loc>
</url>
<txp:section_list break="" exclude="sitemap">
<url>
    <loc><txp:section url /></loc>
</url>
</txp:section_list>
<txp:article_custom section="sections,with,articles" limit="9999">
<url>
    <loc><txp:permlink /></loc>
</url>
</txp:article_custom>
</urlset>

Note that I don’t bother with <lastmod> XML tag as it’s pretty much worthless (Google and most other search engines give no importance to that field since it has been so abused over the years – they can never trust the value as accurate, so ignore it).

Offline

#4 2019-06-07 04:20:23

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 7,271
Website

Re: Sitemaps

Thanks so much guys. I’ll go with Phil’s suggestion to include link rel="prev/next" in the head and the content="noindex, follow" directive. The problem only exists in the blog section of our site which is accentuated by the non internal linking of most articles posted there.


Yiannis
——————————
neme.org | hblack.net | LABS | State Machines | Respbublika! | NeMe @ github

Offline

#5 2019-06-07 04:59:49

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 7,271
Website

Re: Sitemaps

Just a couple of small additions to julian’s sitemap in order to include the home page and sticky articles

<txp:header value="application/xml; charset=utf-8" /><?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc><txp:site_url /></loc></url>
<txp:section_list break="" exclude="sections,to,exclude">
<url>
    <loc><txp:section url="1" /></loc>
<txp:evaluate test="article_custom">
    <lastmod><txp:article_custom section='<txp:section />' limit="1" sort="LastMod desc" status><txp:modified format="%Y-%m-%d" /></txp:article_custom></lastmod>
</txp:evaluate>
</url>
</txp:section_list>
<txp:article_custom section="sections,with,articles" limit="9999" status>
<url>
    <loc><txp:permlink /></loc>
    <lastmod><txp:modified format="%Y-%m-%d" /></lastmod>
</url>
</txp:article_custom>
<txp:category_list exclude="categories, to,exclude" break="">
<url>
    <loc><txp:category url="1" /></loc>
</url>
</txp:category_list>
</urlset>

Yiannis
——————————
neme.org | hblack.net | LABS | State Machines | Respbublika! | NeMe @ github

Offline

Board footer

Powered by FluxBB