Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Blocking AI bots and crawlers
As AI stays in the news in not so good way some ways to try to protect your sites
(there is never certainty that those bots will respect the directives they themselves suggest… caveat emptor). The below list is base on data from this Reuters Institute article and links therein
# AI spiders & crawlers
# https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
# https://platform.openai.com/docs/gptbot
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
Please share additional possibilities.
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
Re: Blocking AI bots and crawlers
More bots & crawlers can be found in the NYtimes robots.txt (towards the end, before the sitemap list; or search for “Amazonbot”).
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
Re: Blocking AI bots and crawlers
Apple has now documented their AI bot. See: support.apple.com/en-us/119829
With Applebot-Extended, web publishers can choose to opt out of their website content being used to train Apple’s foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.
User-agent: Applebot-Extended
Disallow: /
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
Re: Blocking AI bots and crawlers
I’ve been using this list
Offline
Re: Blocking AI bots and crawlers
skewray wrote #337281:
I’ve been using this list
Interesting list. Thanks for that.
I wonder why Applebot
is included. That is just a regular search bot, like Googlebot, Bing, Yandex, etc.
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
Re: Blocking AI bots and crawlers
Offline
Re: Blocking AI bots and crawlers
Uh… that is a bit of a stretch, but OK. But then you also need to block all of googlebot(s), Bingbot(s), etc as all of them seek food for their related AI tools.
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
Re: Blocking AI bots and crawlers
phiw13 wrote #337284:
Uh… that is a bit of a stretch, but OK. But then you also need to block all of googlebot(s), Bingbot(s), etc as all of them seek food for their related AI tools.
Supposedly, Google-Extended and/or GoogleOther are bots for AI input, while GoogleBot is not. Supposedly. No clue about Bingbot, though.
Offline
Re: Blocking AI bots and crawlers
“Supposedly” !
This came up again today: searchengineland.com/google-extended-does-not-stop-google-search-generative-experience-from-using-your-sites-content-433058
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
Re: Blocking AI bots and crawlers
On this subject, Perplexity AI
So, we can all block 44.221.181.252 also.
Offline
Re: Blocking AI bots and crawlers
skewray wrote #337325:
On this subject, Perplexity AI
So, we can all block 44.221.181.252 also.
i saw that yesterday. That Perplexity AI is one hell of a greedy [family-friendly-censored] pig. You can block the IP and tomorrow they change it…
Some people are adding some 403 blocking in .htaccess
or nginx.conf
file (UA-string based blocking, see e.g. Ethan Marcotte). Might work for some, but the robots.txt
file is still needed for others (Googlebot-Extended ure, Apple-Extended maybe). And “might” does quite a bit of work there!
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
Re: Blocking AI bots and crawlers
See also here: https://spawning.ai/ai-txt.
Patrick.
Github | CodePen | Codier | Simplr theme | Wait Me: a maintenance theme | [\a mi.ni.ma]: a “Low Tech” simple Blog theme.
Offline