Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 2010-10-15 05:11:21

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,254
Website GitHub Mastodon Twitter

Re: Googlebot goes berserk on my site

Did you consider adding some meta information?

In the 404 page add

<meta name="revisit-after" content="30 days" />
<meta name="robots" content="noindex,follow" />

In the article list pages

<meta name="revisit-after" content="7 days" />
<meta name="robots" content="index,follow" />

for individual articles

<meta name="revisit-after" content="60 days" />
<meta name="robots" content="index,follow" />

Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#14 2010-10-15 12:33:11

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 453
Website

Re: Googlebot goes berserk on my site

Gocom wrote:

Google’s own Robots.txt manual is pretty helpful.

Might be that the rules are failing because the parameter all doesn’t have a value. Didn’t check but that might be the case. Don’t quote me on that.

Anyhow, to cut the amount of rules, as there is plenty, you could just block URLs that use question mark. For example something like might work (not tested):

User-agent: *
Disallow: /*?
Disallow: /present/
Disallow: /featured-images/
Disallow: /explore/
Disallow: /checkout/
Disallow: /confirmation/

If it works, it would also block some other unwantedish pages, like ?id=, ?q= and ?s= which are part of TXP’s URL structure.

Thanks, Jukka.

Had somehow overlooked the bottom of Manually create a robots.txt file on Google’s Robots.txt manual. Thanks for directing my attention to it.

Also thanks for the suggestion to use Disallow: /*?. However, this would also disallow access to many of my legit pages and ?​rah_sitemap=sitemap.

I am now trying the following:

User-agent: *
Allow: /
Disallow: /*keywords=*
Disallow: /*place=*
Disallow: /*artist=*
Disallow: /*period=*
Disallow: /*year=*
Disallow: /*c=*
Disallow: /present/
Disallow: /featured-images/
Disallow: /explore/
Disallow: /checkout/
Disallow: /confirmation/

The googlebot explanations seem to imply that this should work, but I am not very confident.

Last edited by Kjeld (2010-10-15 12:33:45)


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#15 2010-10-15 12:34:56

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 453
Website

Re: Googlebot goes berserk on my site

colak wrote:

Did you consider adding some meta information?

Very basic information to add, but I had not considered it at all. Thanks!


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#16 2010-10-15 12:42:28

Gocom
Developer Emeritus
From: Helsinki, Finland
Registered: 2006-07-14
Posts: 4,533
Website

Re: Googlebot goes berserk on my site

Kjeld wrote:

and ?​rah_sitemap=sitemap

Rah_sitemap also automatically answers to example.com/sitemap.xml and example.com/sitemap.xml.gz if you prefer clean URLs :-) Example.com being your domain.tld.

You could also add Sitemap: row to your Robots.txt to tell bots that the sitemap infact is there, even that GoogleBot should, from time to time, automatically check it.

Last edited by Gocom (2010-10-15 12:47:23)

Offline

#17 2010-10-15 23:13:40

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 453
Website

Re: Googlebot goes berserk on my site

Gocom wrote:

You could also add Sitemap: row to your Robots.txt to tell bots that the sitemap infact is there, even that GoogleBot should, from time to time, automatically check it.

Thanks, Jukka, for this great tip, too. Have added it to my robots.txt file.


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#18 2010-10-15 23:19:15

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 453
Website

Re: Googlebot goes berserk on my site

Yesterday, as I mentioned in an earlier entry, I started using the following lines on my robots.txt:

Disallow: /*keywords=*
Disallow: /*place=*
Disallow: /*artist=*
Disallow: /*period=*
Disallow: /*year=*
Disallow: /*c=*

So far, googlebot hasn’t touched these links yet. But it has only been half a day and googlebot just looked at a few links, so it’s wait and see for now.


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#19 2010-10-16 19:08:05

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,482
Bitbucket GitHub

Re: Googlebot goes berserk on my site

Kjeld wrote:
So far, googlebot hasn’t touched these links yet. But it has only been half a day and googlebot just looked at a few links, so it’s wait and see for now.

Did you set up the crawl frequency in Webmaster Tools, yet? Site Configuration -> Settings -> Crawl Rate -> Set Custom Crawl Rate.

Offline

#20 2010-10-16 23:23:17

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 453
Website

Re: Googlebot goes berserk on my site

gaekwad wrote:

Did you set up the crawl frequency in Webmaster Tools, yet? Site Configuration -> Settings -> Crawl Rate -> Set Custom Crawl Rate.

Thanks, Pete. I didn’t actually. But I have no problems with the current crawl rate. And more than 30 hours after I last adjusted robots.txt, googlebot (or any other robot) still has not touched the urls that I disallowed with the double asterisks (Disallow: /*keywords=*), so it seems to be working.

Thanks to all for the tips and advice!


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#21 2010-10-20 22:38:37

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 453
Website

Re: Googlebot goes berserk on my site

Update: the strategy below definitely works. Googlebot has left all links that include the strings between the asterisks completely alone during the past week. I can recommend it for your site. Thanks again to all for the input on this!

Disallow: /*keywords=*
Disallow: /*place=*
Disallow: /*artist=*
Disallow: /*period=*
Disallow: /*year=*
Disallow: /*c=*
Disallow: /*q=*
Disallow: /*s=*
Disallow: /*id=*

Last edited by Kjeld (2010-10-20 22:38:58)


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#22 2010-10-21 23:04:40

maniqui
Member
From: Buenos Aires, Argentina
Registered: 2004-10-10
Posts: 3,070
Website

Re: Googlebot goes berserk on my site

Kjeld, roliviu smells badly like an spammer :)
I’ll report the user.


La música ideas portará y siempre continuará

TXP Builders – finely-crafted code, design and txp

Offline

#23 2010-10-21 23:17:49

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 453
Website

Re: Googlebot goes berserk on my site

maniqui wrote:

Kjeld, roliviu smells badly like an spammer :) I’ll report the user.

Thanks. I missed that. I have deleted my response.


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

Board footer

Powered by FluxBB