Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2024-02-28 02:57:13

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,110
Website

Blocking AI bots and crawlers

As AI stays in the news in not so good way some ways to try to protect your sites
(there is never certainty that those bots will respect the directives they themselves suggest… caveat emptor). The below list is base on data from this Reuters Institute article and links therein

# AI spiders & crawlers
# https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
# https://platform.openai.com/docs/gptbot
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /

Please share additional possibilities.


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#2 2024-03-02 02:08:44

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,110
Website

Re: Blocking AI bots and crawlers

More bots & crawlers can be found in the NYtimes robots.txt (towards the end, before the sitemap list; or search for “Amazonbot”).


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#3 2024-06-12 06:00:56

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,110
Website

Re: Blocking AI bots and crawlers

Apple has now documented their AI bot. See: support.apple.com/en-us/119829

With Applebot-Extended, web publishers can choose to opt out of their website content being used to train Apple’s foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.

User-agent: Applebot-Extended
Disallow: /

Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#4 2024-06-13 22:09:47

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 167
Website

Re: Blocking AI bots and crawlers

I’ve been using this list

Offline

#5 2024-06-13 22:42:49

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,110
Website

Re: Blocking AI bots and crawlers

skewray wrote #337281:

I’ve been using this list

Interesting list. Thanks for that.

I wonder why Applebot is included. That is just a regular search bot, like Googlebot, Bing, Yandex, etc.


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#6 2024-06-14 00:40:33

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 167
Website

Re: Blocking AI bots and crawlers

phiw13 wrote #337282:

Interesting list. Thanks for that.

I wonder why Applebot is included. That is just a regular search bot, like Googlebot, Bing, Yandex, etc.

AppleBot is seeking fodder for Siri, which is an AI. See here

Last edited by skewray (2024-06-14 00:41:18)

Offline

#7 2024-06-14 01:46:57

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,110
Website

Re: Blocking AI bots and crawlers

Uh… that is a bit of a stretch, but OK. But then you also need to block all of googlebot(s), Bingbot(s), etc as all of them seek food for their related AI tools.


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#8 2024-06-14 03:55:49

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 167
Website

Re: Blocking AI bots and crawlers

phiw13 wrote #337284:

Uh… that is a bit of a stretch, but OK. But then you also need to block all of googlebot(s), Bingbot(s), etc as all of them seek food for their related AI tools.

Supposedly, Google-Extended and/or GoogleOther are bots for AI input, while GoogleBot is not. Supposedly. No clue about Bingbot, though.

Offline

#9 2024-06-14 04:43:31

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,110
Website

Re: Blocking AI bots and crawlers


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#10 2024-06-20 18:56:29

skewray
Member
From: Sunny Southern California
Registered: 2013-04-25
Posts: 167
Website

Re: Blocking AI bots and crawlers

On this subject, Perplexity AI

So, we can all block 44.221.181.252 also.

Offline

#11 2024-06-21 00:32:03

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,110
Website

Re: Blocking AI bots and crawlers

skewray wrote #337325:

On this subject, Perplexity AI

So, we can all block 44.221.181.252 also.

i saw that yesterday. That Perplexity AI is one hell of a greedy [family-friendly-censored] pig. You can block the IP and tomorrow they change it…

Some people are adding some 403 blocking in .htaccess or nginx.conf file (UA-string based blocking, see e.g. Ethan Marcotte). Might work for some, but the robots.txt file is still needed for others (Googlebot-Extended ure, Apple-Extended maybe). And “might” does quite a bit of work there!


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#12 2024-07-24 06:20:56

Pat64
Plugin Author
From: France
Registered: 2005-12-12
Posts: 1,617
GitHub Twitter

Re: Blocking AI bots and crawlers

See also here: https://spawning.ai/ai-txt.


Patrick.

Github | CodePen | Codier | Simplr theme | Wait Me: a maintenance theme | [\a mi.ni.ma]: a “Low Tech” simple Blog theme.

Offline

Board footer

Powered by FluxBB