Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2007-06-09 01:37:33

tye
Member
From: Pottsville, NSW
Registered: 2005-07-06
Posts: 859
Website

...Avoid Duplicate content?

Hi – I was just in a discussion on another forum regarding rss feeds being duplicate content – which rang alarm bells in my head…

So I have added this to my robots.txt

User-agent: *
Disallow: /rss/
Disallow: /atom/

but it got me thinking – is there anything else which I would need to block with robots.txt?

I know in wordpress there are a whole lot of things which need to be removed, such as there archiving system.

Offline

#2 2007-06-09 09:08:09

Jeremie
Member
From: Provence, France
Registered: 2004-08-11
Posts: 1,578
Website

Re: ...Avoid Duplicate content?

I very much doubt XML feeds are viewed by Google as duplicate content.

Offline

#3 2007-06-09 20:50:24

Andrew
Plugin Author
Registered: 2004-02-23
Posts: 730

Re: ...Avoid Duplicate content?

Additionally, if you block all robots from viewing your feeds, you’re also blocking Google (and Y/M) from crawling your feed, which is used to include your site content in things like Google Blog Search. I’d be more worried about having multiple pages on my site itself with the same content, like an individual article page vs paginated pages that display multiple articles or archive pages using full content rather than excerpt.

When it comes to other sites scraping your site and republishing your content, most search engines follow a “first rights” style ownership system, meaning that if you published it first, then all other publishers of the same content are filtered. You get ownership rights.

Offline

#4 2007-06-11 03:12:23

Mary
Sock Enthusiast
Registered: 2004-06-27
Posts: 6,236

Re: ...Avoid Duplicate content?

Ditto to what Jeremie said. They’re a long-existing, standard way of conveying content by alternate means. Just like “Printer Friendly” pages.

Offline

#5 2007-06-11 05:55:01

tye
Member
From: Pottsville, NSW
Registered: 2005-07-06
Posts: 859
Website

Re: ...Avoid Duplicate content?

But Printer Friendly pages are viewed as duplicate content and need to be blocked via robots.txt – same content, 2 urls

Offline

#6 2007-06-11 11:57:00

Mary
Sock Enthusiast
Registered: 2004-06-27
Posts: 6,236

Re: ...Avoid Duplicate content?

Adam Lasnik talks about paid links, duplicate content and more

That depends upon what your specific fear is. You’re not penalized for them, if that’s what you’re thinking, and what I was referring to. It may be an SEO problem, depending upon multiple factors.

Either way, I don’t think it’s a considerable enough concern to be really ‘concerned’ about, until you’re directly affected by it (in which case it is easily rectified).

Offline

#7 2007-06-12 00:22:28

tye
Member
From: Pottsville, NSW
Registered: 2005-07-06
Posts: 859
Website

Re: ...Avoid Duplicate content?

Thanks Mary – thats a really interesting article.

Offline

#8 2007-06-12 00:35:09

Jeremie
Member
From: Provence, France
Registered: 2004-08-11
Posts: 1,578
Website

Re: ...Avoid Duplicate content?

tye wrote:

But Printer Friendly pages are viewed as duplicate content and need to be blocked via robots.txt – same content, 2 urls

Because, contrary to what Mary said, “printer friendly” is duplicate content. Technically, a XML feed also is, but isn’t viewed as such.

“Printer friendly” is just a bad idea made into a bad tool for websites which doesn’t use CSS, or prior to a real CSS usage. It’s a thing of the past, and was never formalized or normalized.

XML feed are normalized, easily identified, and quite recognized by Google bot and the likes.

Offline

Board footer

Powered by FluxBB