Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Pages: 1
...Avoid Duplicate content?
Hi – I was just in a discussion on another forum regarding rss feeds being duplicate content – which rang alarm bells in my head…
So I have added this to my robots.txt
User-agent: *
Disallow: /rss/
Disallow: /atom/
but it got me thinking – is there anything else which I would need to block with robots.txt?
I know in wordpress there are a whole lot of things which need to be removed, such as there archiving system.
Offline
Re: ...Avoid Duplicate content?
I very much doubt XML feeds are viewed by Google as duplicate content.
Offline
#3 2007-06-09 20:50:24
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
Re: ...Avoid Duplicate content?
Additionally, if you block all robots from viewing your feeds, you’re also blocking Google (and Y/M) from crawling your feed, which is used to include your site content in things like Google Blog Search. I’d be more worried about having multiple pages on my site itself with the same content, like an individual article page vs paginated pages that display multiple articles or archive pages using full content rather than excerpt.
When it comes to other sites scraping your site and republishing your content, most search engines follow a “first rights” style ownership system, meaning that if you published it first, then all other publishers of the same content are filtered. You get ownership rights.
Offline
#4 2007-06-11 03:12:23
- Mary
- Sock Enthusiast
- Registered: 2004-06-27
- Posts: 6,236
Re: ...Avoid Duplicate content?
Ditto to what Jeremie said. They’re a long-existing, standard way of conveying content by alternate means. Just like “Printer Friendly” pages.
Offline
Re: ...Avoid Duplicate content?
But Printer Friendly pages are viewed as duplicate content and need to be blocked via robots.txt – same content, 2 urls
Offline
#6 2007-06-11 11:57:00
- Mary
- Sock Enthusiast
- Registered: 2004-06-27
- Posts: 6,236
Re: ...Avoid Duplicate content?
Adam Lasnik talks about paid links, duplicate content and more
That depends upon what your specific fear is. You’re not penalized for them, if that’s what you’re thinking, and what I was referring to. It may be an SEO problem, depending upon multiple factors.
Either way, I don’t think it’s a considerable enough concern to be really ‘concerned’ about, until you’re directly affected by it (in which case it is easily rectified).
Offline
Re: ...Avoid Duplicate content?
Thanks Mary – thats a really interesting article.
Offline
Re: ...Avoid Duplicate content?
tye wrote:
But Printer Friendly pages are viewed as duplicate content and need to be blocked via robots.txt – same content, 2 urls
Because, contrary to what Mary said, “printer friendly” is duplicate content. Technically, a XML feed also is, but isn’t viewed as such.
“Printer friendly” is just a bad idea made into a bad tool for websites which doesn’t use CSS, or prior to a real CSS usage. It’s a thing of the past, and was never formalized or normalized.
XML feed are normalized, easily identified, and quite recognized by Google bot and the likes.
Offline
Pages: 1