Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

  1. Index
  2. » How?
  3. » Sitemaps

#1 2019-06-06 06:23:47

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 8,626
Website GitHub Twitter

Sitemaps

I am creating a new thread for this as it maybe used by others. A few months ago, Julian came up with this solution to create sitemaps which works wonderfully. Is there a way though to add pages to the code? ie http://www.site.tld/section/?pg=2 … etc?

On the other hand is this needed?


Yiannis
——————————
neme.org | hblack.net | State Machines | NeMe @ github
I do my best editing after I click on the submit button.

Offline

#2 2019-06-06 08:05:46

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,149
Website

Re: Sitemaps

colak wrote #318381:

Is there a way though to add pages to the code? ie http://www.site.tld/section/?pg=2 … etc?

Not something I’ve tried in sitemap.xml but an interesting idea.

I don’t know how much actual SEO benefit this will bring when each of your articles on sub-pages has their own URL. It would make sense on sections with running lists of posts that don’t have an own URL, e.g. like in a forum, or a tumblr-style log where posts are short and don’t link deeper.

It’s probably easiest if your pagination is predictable. If your users can change the number of items per page or sort in different directions or with filters, you’ll need to cater for each and every filter situation. By way of example: Google’s search result for this forum are based on a different number of posts per page as my personal settings, causing them to be consistently wrong for me.

If you do have static pagination settings, you can use the following to determine the number of pages:

<txp:article_custom section="your-section" pageby="10" pgonly />

where the value for pageby should match the one you are using on your section for limit. See the article_custom docs

In combination with a txp:variable and something like rah_repeat you could output the sitemap links as follows, e.g.

<txp:variable name="section_pages"><txp:article_custom section="your-section" pageby="10" pgonly /></txp:variable>
<txp:if_variable name="section_pages" value="1" not>
	 <txp:rah_repeat range='2, <txp:variable name="section_pages" />'>
		<loc><txp:section url="1" />?pg=<txp:rah_repeat_value /></loc>
	 </txp:rah_repeat>
</txp:if_variable>

I started range with 2 rather than 1 as you don’t need ?pg=1 because it’s the same as the section landing page. The if_variable skips it if only one page exists.

On the other hand is this needed?

This SEO Guide to URL Parameter Handling discusses different approaches and seems to suggest it is sufficient to mark up your pages for clear crawling as follows:

<link rel="prev" href="https://www.example.com/category?page=2">
<link rel="next" href="https://www.example.com/category?page=4">
<link rel="canonical" href="https://www.example.com/category?page=3">

where the canonical url is the current page in the paginated view.

It also discusses using the URL Parameter Tool in Google Search Console to identify the pagination url parameter. At the end the author compares the different methods and suggests using the above without including paginated pages in sitemap.xml.


TXP Builders – finely-crafted code, design and txp

Offline

#3 2019-06-06 09:01:53

philwareham
Core designer
From: Haslemere, Surrey, UK
Registered: 2009-06-11
Posts: 3,523
Website GitHub Twitter

Re: Sitemaps

For best SEO – to avoid duplicate content and/or content changes over time – I ensure that in article lists only <meta name="robots" content="index, follow"> the first page of article lists, all subsequent list pages are set to <meta name="robots" content="noindex, follow">.

Individual articles of course are always <meta name="robots" content="index, follow">.

See <head> example below:

<head>
    ... global head content ...
    <txp:if_article_list>
        ... article list-specific head content ...
        <txp:article pgonly limit="12" />
        <txp:variable name="page" value='<txp:page_url type="pg" />' />
        <txp:if_variable name="page" value="1">
            <meta name="robots" content="index, follow">
            <link rel="canonical" href="<txp:section url />">
            ... other article-list specific head content such as JSON-LD, Open Graph and Twitter Cards ...
        <txp:else />
            <meta name="robots" content="noindex, follow">
        </txp:if_variable>
        <txp:evaluate test>
            <link rel="prev" href="<txp:newer />">
        </txp:evaluate>
        <txp:evaluate test>
            <link rel="next" href="<txp:older />">
        </txp:evaluate>
    <txp:else />
        ... individual article-specific head content ...
        <meta name="robots" content="<txp:if_expires>unavailable_after: <txp:expires gmt format="%d-%b-%y %T" /> GMT<txp:else />index, follow</txp:if_expires>">
        <link rel="canonical" href="<txp:permlink />"
        ... other individual article-specific head content such as JSON-LD, Open Graph and Twitter Cards ...
    </txp:if_article_list>
</head>

For category list pages I set all these to <meta name="robots" content="noindex, follow"> because they are just duplicating content that is already in article lists (although filtered down to a specific category).

Then in your sitemap.xml page template you won’t need to include any category pages at all, and just the overall section (page 1) URL, something like so:

<?xml version="1.0" encoding="UTF-8"?><txp:header value="application/xml; charset=utf-8" />
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
    <loc><txp:site_url /></loc>
</url>
<txp:section_list break="" exclude="sitemap">
<url>
    <loc><txp:section url /></loc>
</url>
</txp:section_list>
<txp:article_custom section="sections,with,articles" limit="9999">
<url>
    <loc><txp:permlink /></loc>
</url>
</txp:article_custom>
</urlset>

Note that I don’t bother with <lastmod> XML tag as it’s pretty much worthless (Google and most other search engines give no importance to that field since it has been so abused over the years – they can never trust the value as accurate, so ignore it).

Offline

#4 2019-06-07 04:20:23

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 8,626
Website GitHub Twitter

Re: Sitemaps

Thanks so much guys. I’ll go with Phil’s suggestion to include link rel="prev/next" in the head and the content="noindex, follow" directive. The problem only exists in the blog section of our site which is accentuated by the non internal linking of most articles posted there.


Yiannis
——————————
neme.org | hblack.net | State Machines | NeMe @ github
I do my best editing after I click on the submit button.

Offline

#5 2019-06-07 04:59:49

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 8,626
Website GitHub Twitter

Re: Sitemaps

Just a couple of small additions to julian’s sitemap in order to include the home page and sticky articles

<txp:header value="application/xml; charset=utf-8" /><?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc><txp:site_url /></loc></url>
<txp:section_list break="" exclude="sections,to,exclude">
<url>
    <loc><txp:section url="1" /></loc>
<txp:evaluate test="article_custom">
    <lastmod><txp:article_custom section='<txp:section />' limit="1" sort="LastMod desc" status><txp:modified format="%Y-%m-%d" /></txp:article_custom></lastmod>
</txp:evaluate>
</url>
</txp:section_list>
<txp:article_custom section="sections,with,articles" limit="9999" status>
<url>
    <loc><txp:permlink /></loc>
    <lastmod><txp:modified format="%Y-%m-%d" /></lastmod>
</url>
</txp:article_custom>
<txp:category_list exclude="categories, to,exclude" break="">
<url>
    <loc><txp:category url="1" /></loc>
</url>
</txp:category_list>
</urlset>

Yiannis
——————————
neme.org | hblack.net | State Machines | NeMe @ github
I do my best editing after I click on the submit button.

Offline

#6 2021-05-10 17:05:32

John-Paul F
Member
Registered: 2021-03-15
Posts: 18
Website Twitter

Re: Sitemaps

colak wrote #318403:

Just a couple of small additions to julian’s sitemap in order to include the home page and sticky articles

<txp:header value="application/xml; charset=utf-8" /><?xml version="1.0" encoding="UTF-8"?>...

Hello – I realise I don’t seem to have a sitemap, and this may explain my sensationally bad Google Analytics stats. Is this still a good approach to take? And do I need to do the thing Julian mentions with my .htaccess file (in the post that led to the creation of this new thread)?

Thank you in advance team Txp.


Strictly Amateur

Offline

#7 2021-05-10 17:20:23

philwareham
Core designer
From: Haslemere, Surrey, UK
Registered: 2009-06-11
Posts: 3,523
Website GitHub Twitter

Re: Sitemaps

Hi John-Paul, yes a sitemap.xml is helpful in telling search engines what to index/what not to index on your site, and page priorities for each entry.

You’ll need to create a section called ‘sitemap’, assign a page template similar to what’s been posted above, and then either..

1. In Apache .htaccess file…

<IfModule mod_rewrite.c>
    RewriteRule ^sitemap.xml$  index.php?s=sitemap [L]
</IfModule>

Note: If you already have an <IfModule mod_rewrite.c> block in .htaccess (which you should do for the Textpattern rules), just include the RewriteRule line in your existing block.

2. In Nginx config…

location = /sitemap.xml {
    rewrite "sitemap.xml" /index.php?s=sitemap;
}

Then you’ll need to go to your Google Search Console account for the domain, and under the sitemaps tab enter the location of your sitemap.xml. Also do the same for Bing Webmaster Tools and/or any other search engines you feel are important and provide those tools (i.e. Yandex if Russian region is of importance to your domain).

Whilst in Search Console check out and fix any other issues that it reports – which will help your overall ranking scores to a degree.

Offline

#8 2021-05-12 08:45:46

John-Paul F
Member
Registered: 2021-03-15
Posts: 18
Website Twitter

Re: Sitemaps

Thank you, Phil.


Strictly Amateur

Offline

  1. Index
  2. » How?
  3. » Sitemaps

Board footer

Powered by FluxBB