Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#109 2008-04-01 01:17:49

kevinpotts
Member
From: Ghost Coast
Registered: 2004-12-07
Posts: 370

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

jm wrote:

Since Dreamhost users have been having problems editing the plugin, here’s a 4.0.6-compatible version:

Thanks for this. Appreciated.

BTW, my offer to pay someone to enhance this plugin stands. Just ping me either here or on my e-mail.

Last edited by kevinpotts (2008-04-01 01:29:25)


Kevin
(graphicpush)

Offline

#110 2008-04-01 02:00:25

jm
Plugin Author
From: Missoula, MT
Registered: 2005-11-27
Posts: 1,746
Website

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

I’ll give it a shot. What are the current bugs and desired features?

Offline

#111 2008-04-01 02:06:23

kevinpotts
Member
From: Ghost Coast
Registered: 2004-12-07
Posts: 370

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

Whatever the community desires, I think. For me personally, there are three essentials:

  1. 4.0.6 compatible out of the box — no hacks
  2. Follows Google’s protocol exactly (I don’t want to worry whether it works or not)
  3. Select which sections are included / excluded from the rendered sitemap

My ideal situation is to produce a new version (jmd_you_r_awesome, or whatever) and kill asy_sitemap since development has obviously ceased.

Last edited by kevinpotts (2008-04-01 02:07:53)


Kevin
(graphicpush)

Offline

#112 2008-04-01 04:35:23

the_ghost
Plugin Author
From: Minsk, The Republic of Belarus
Registered: 2007-07-26
Posts: 907
Website

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

2 jm:
  1. Correct auto-update of sitemap after article update and new article creation
  2. Ping google not every update, but accumulate all changes and make ping every n-hours, for example every 25 hours (to prevent google think that we are spammers :)
  3. Show some stat page where the amount of links in sitemap can be seen – would be ideal to show a liitle mention of update on article page, somewhere in the bottom of the page

Last edited by the_ghost (2008-04-01 04:36:41)


Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?

Offline

#113 2008-04-01 05:07:34

jm
Plugin Author
From: Missoula, MT
Registered: 2005-11-27
Posts: 1,746
Website

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

Thanks for the input guys. So we’ve got:

  1. 4.0.6 compatibility
  2. Follow sitemaps.org protocol
  3. Exclude sections
  4. Update sitemap after each article creation and modification
  5. Delayed pinging
  6. Stats page

#2: I think priority shouldn’t be included for individual articles – it’s an optional element that’s prone to overuse. However, priority for each section might be useful. Yay or nay?

#3: How do you want this displayed? The simplest method would be a multiselect, but if we introduce priority, we would need something akin to the presentation>sections tab (hopefully not a giant vertical page).

  1. & #6: Since cron isn’t a user-friendly option, I’ll create a database table that stores the following info:
  • Time since last ping
  • Number of links
  • Last update/regeneration

So at each generation, a query would run to check the last time, and if it met or exceeded n hours (set in prefs), it would ping google. As the sitemap was being generated, the number of links/@urls@ would be stored alongside the lastmod of the last article.

Offline

#114 2008-04-01 08:45:21

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

jm

Cool that you’re getting things moving on this plugin. Perhaps my post on the subject was too big and people couldn’t be arsed to read it ;-)

I still don’t see how this plugin helps search engines and would love someone to explain whether a sitemap is a statement of intent or fact and give some concrete examples of how it helps Googlebot index a site better; imo it does an admirable job on its own if you get your SEO right.

As for features, would it make sense to get the plugin to learn and adapt the priority/frequency based on frequency of updates in a section? It seems at the moment it just sets the update frequency to a fixed amount, which is surely unwise given the vast array of sites out there. For example: why should a blog post be set to change frequently? Once it’s posted, that’s pretty much it, right? (not counting comments and a couple of edits). And after the comments have expired and the article is no longer modified, why should the sitemap remain set to update the article frequently? It has essentially become a static page.

Also, without setting priority there’s no way for bots to judge the relative importance of pages or sections so people might as well leave the bot alone to do its crawly stuff without trying to give it misguided hints. Or am I wrong? (… probably!)

In terms of plugin usability I still think all XML/robots.txt and config files should be updated automatically by the plugin. There’s too much intervention and messing around with FTP required right now (points 9 and 10 in my original post). This should be fairly easy to achieve.

It’s great that you’re willing to tackle this, jm. I did some preliminary work on updating it a while ago, but since very few said what they wanted it to actually do and whether it was truly useful, I gave up trying to guess! Shoot me a mail if you wanna chat about what I tried (it wasn’t much, but might help).


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#115 2008-04-01 09:43:15

trenc
Plugin Author
From: Malmö
Registered: 2008-02-27
Posts: 572
Website GitHub

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

Only one quickie:
A permanently pinging of the sitemap.xml or gz-compressed is not needed. If you have once pinged google and/or have used the webmastertools and/or showed the place of the sitemap in the robots.txt so no further ping is necessary. Google does his job by itself.

And: A sitemap.xml only makes sense, if you have many sites (blog or portal or something else) and these sites are not link anywhere in your website so googlebot can’t follow the link. If you have an archive site or a sitemap site so the googlebot can follow these links and no sitemap.xml is needed.

Last edited by trenc (2008-04-01 09:44:17)

Offline

#116 2008-04-01 10:12:38

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

trenc

That’s how I interpreted the spec too. Once the bot knows of the existence of the sitemap (via robots.txt or a ping) it’ll read it in future. And, iirc, each time it reads it, the frequencies and priorities in the sitemap tell it what has changed and what’s more important to index. Hence, keeping all priorities and frequencies the same adds no value.

A sitemap.xml only makes sense, if you have many sites… and these sites are not link anywhere in your website so googlebot can’t follow the link.

I agree. Maybe I’ve missed something, but I can’t see the need for a sitemap in most ordinary web site situations. I’d love to hear how it helps people with a TXP site because, on the surface, it seems a lot of hassle for minimum return…


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#117 2008-04-01 11:40:57

trenc
Plugin Author
From: Malmö
Registered: 2008-02-27
Posts: 572
Website GitHub

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

Bloke schrieb:

And, iirc, each time it reads it, the frequencies and priorities in the sitemap tell it what has changed and what’s more important to index. Hence, keeping all priorities and frequencies the same adds no value.

All the priority and frequency stuff are only a guiding principle not a rule. In the end google mainly decide these by real the posting frequency, last modified header data, similarity with other sites of the website, importance, trust for searchers and other websites etc. pp. So the googlebot can visit a blog 30 times a day another once a week and a normal website once in a month. It’will not really depending on a sitemap – it’s a decision by google.

If I would know the google algorithm… holy moly… I would be a millionaire or even more. :)

But i’ve find out that, at least, the order in the serps of a site:domain.tld query depends a little bit on the given sitemap. But – cui bono?

Offline

#118 2008-04-01 13:49:59

kevinpotts
Member
From: Ghost Coast
Registered: 2004-12-07
Posts: 370

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

jm wrote:

Thanks for the input guys. So we’ve got …

Thanks again for taking this on. While I can offer no assistance in the programming, I would be happy to help with any interface design or usability, and of course I would be happy to be a beta tester.


Kevin
(graphicpush)

Offline

#119 2008-04-01 14:12:16

merz1
Member
From: Hamburg
Registered: 2006-05-04
Posts: 994
Website

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

(Bloke) …I can’t see the need for a sitemap in most ordinary web site situations. I’d love to hear how it helps people with a TXP site…

  • 1 article (updated), one section, five tags, two categories = eight pages = 1 actual article entry on top of the sitemap = fast indexing of the article

Below the line a sitemap adds some priority/recommendation what to index first.

Btw. JM: 5. Delayed pinging
6. Stats page

I would not need that and I would love it only as an optional feature with a switch (On/Off).


Get all online mentions of Textpattern via OPML subscription: TXP Info Sources: Textpattern RSS feeds as dynamic OPML

Offline

#120 2008-04-01 18:25:34

jm
Plugin Author
From: Missoula, MT
Registered: 2005-11-27
Posts: 1,746
Website

Re: asy_sitemap: Google-Sitemap (as-is/for developers)

Bloke wrote:

I still don’t see how this plugin helps search engines and would love someone to explain whether a sitemap is a statement of intent or fact and give some concrete examples of how it helps Googlebot index a site better; imo it does an admirable job on its own if you get your SEO right.

According to Google:

  • Your site has dynamic content.
  • Your site has pages that aren’t easily discovered by Googlebot during the crawl process – for example, pages featuring rich AJAX or Flash.
  • Your site is new and has few links to it. (Googlebot crawls the web by following links from one page to another, so if your site isn’t well linked, it may be hard for us to discover it.)
  • Your site has a large archive of content pages that are not well linked to each other, or are not linked at all.

The last point makes sense to me. On one site I developed, the archive is the only way to find other articles – the authors never link to each other.

As for features, would it make sense to get the plugin to learn and adapt the priority/frequency based on frequency of updates in a section?

That sounds like a good idea for section frequency. I do agree with trenc’s point about lastmod at least for articles. Setting a high changefreq/article is a little reminiscent of the revisit-after meta tag.

Also, without setting priority there’s no way for bots to judge the relative importance of pages or sections so people might as well leave the bot alone to do its crawly stuff without trying to give it misguided hints.

Yeah after a little more reading, it seems the only time it would play a role is for pages of similar content. Even then, I strongly doubt an arbitrary priority is going to affect the crawler. Since it’s optional, I’m not going to include it in the release.

Shoot me a mail if you wanna chat about what I tried (it wasn’t much, but might help).

I probably will soon :). I’m starting from scratch initially (code-onset OCD), then I’ll take a look at the existing asy_sitemap for ideas.

trenc wrote:

A permanently pinging of the sitemap.xml or gz-compressed is not needed.

Hey, that’s good to know. I saw the robots.txt mention in the protocol, so that seems like a good way to go (or w/webmaster tools). I haven’t used webmaster tools in a few years, but doesn’t it let you point to a sitemap? Seems that would eliminate the need for any kind of pinging mechanism.

Offline

Board footer

Powered by FluxBB