Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Pages: 1
Do you have robots.txt file?
I was wondering if experienced users have a robots.txt file. I’m trying to understand what it does and have been reading up on it, but still am clueless as to it’s uses and whether it’s worth having one. Is there some default text that the robots.txt should have and does it make data vulonerable with regards to TXP? I don’t know why it would, but if a robot indexes (is that the right term) pages, will it display things like user information?
Hope someone can clear this up for me.
Cheers
Offline
Re: Do you have robots.txt file?
A search-engine robot has the same or less capabilities like/than a normal user with a browser. It’s just less random/interest driven, and tends to very diligent in “clicking” on links.
The robots.txt file is a request from you to bots to not bother crawling/indexing the pages or paths mentioned in the file. Good bots will honour those requests. If you don’t mind having all publicly accessible pages in searchengines, you don’t need one. There is no harm in having one either.
Offline
Re: Do you have robots.txt file?
I always put up a robots.txt file with one line: <pre>User-agent: *</pre>
This does you the favor of keeping your Apache error logs from filling up, as does a favicon.ico in the root.
We Love TXP . TXP Themes . TXP Tags . TXP Planet . TXP Make
Offline
Re: Do you have robots.txt file?
thanks for the input guys, you’ve made things much clearer
Offline
#5 2006-08-30 18:59:41
- NyteOwl
- Member

- From: Nova Scotia, Canada
- Registered: 2005-09-24
- Posts: 539
Re: Do you have robots.txt file?
I use one to keep the spiders out of non-text content related directories such as images, scripts, etc. As always, only bots that obey the specificatino will bother checking robots.txt
Obsolescence is just a lack of imagination. / 36-bits Forever! / #include <disclaimer.h>;
Offline
Re: Do you have robots.txt file?
I’ve heard that if you are using black-hattish SEO techniques – for example hiding large areas of keyword-rich content with CSS – keeping the aforementioned CSS in an external style sheet and adding it to the robots.txt as a file not to index – might help keep you out of trouble.
Again, that would depend on robots honoring your robots.txt – which I imagine most of the major ones (google, yahoo, msn) do since internet privacy is such a touchy issue and logs can easily show which robots go where.
At the same time I would question the effectiveness of those techniques.. anyone have any thoughts about this?
Travel Atlas * Org | Start Somewhere
Offline
Re: Do you have robots.txt file?
This is my robots.txt, it tells all bots (that actually uses it) to delay 5 seconds between requests and to not index the files, images and textpattern folders.
<small>Edit: Tricky to find out of to circumvent the textile for code snippets (IMHO hash’es shouldn’t be parsed in < code > blocks), and how do I turn off code started with “bc dot dot”</small>
User-agent: *
Crawl-Delay: 5
Disallow: /files
Disallow: /images
Disallow: /textpattern
Last edited by gunnar (2006-09-28 08:05:02)
Offline
Re: Do you have robots.txt file?
Here ‘s mine
<pre>User-agent: *
Disallow: /textpattern/
Disallow: /files/
User-agent: Exabot
Disallow: /
User-agent: BecomeBot
Disallow: /
User-agent: ConveraCrawler
Disallow: /</pre>
Last edited by colak (2006-09-28 10:07:00)
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
#9 2006-09-28 21:18:21
- NyteOwl
- Member

- From: Nova Scotia, Canada
- Registered: 2005-09-24
- Posts: 539
Re: Do you have robots.txt file?
Hiding keyword stuffing from the search engines is becoming almost impossible so why bother to try? Build a decent site with good cotnent and the underhanded tricks aren’t necessary anyway.
One cautionary note – do not put a directory or file in robots.txt that isn’t somehow referenced elsewhere on your site – especially if you are trying to keep it private since spiders only know about that is linked to AND anything listed in robots.txt
Obsolescence is just a lack of imagination. / 36-bits Forever! / #include <disclaimer.h>;
Offline
Pages: 1