summaryrefslogtreecommitdiff
path: root/internal/web/robots.go
diff options
context:
space:
mode:
authorLibravatar Daenney <daenney@users.noreply.github.com>2024-06-23 15:34:21 +0200
committerLibravatar GitHub <noreply@github.com>2024-06-23 15:34:21 +0200
commit4604224c4d4d286d4f4e860e3c7fe7c61a4a2452 (patch)
treea2888e451b92eb25c11efc257f9603d093c92c6f /internal/web/robots.go
parent[bugfix] add Date and Message-ID headers for email (#3031) (diff)
downloadgotosocial-4604224c4d4d286d4f4e860e3c7fe7c61a4a2452.tar.xz
[chore] Update our robots.txt (#3033)
This syncs our copy with the current state of the ai.robots.txt repository. Upstream has tightened their scope to be AI-only, whereas before it included a bunch of SEO and "web intelligence" marketing stuff. I've kept those but moved them into their own section.
Diffstat (limited to 'internal/web/robots.go')
-rw-r--r--internal/web/robots.go22
1 files changed, 14 insertions, 8 deletions
diff --git a/internal/web/robots.go b/internal/web/robots.go
index 58b541413..9ecf58182 100644
--- a/internal/web/robots.go
+++ b/internal/web/robots.go
@@ -34,32 +34,38 @@ const (
User-agent: AdsBot-Google
User-agent: Amazonbot
User-agent: anthropic-ai
-User-agent: Applebot
-User-agent: AwarioRssBot
-User-agent: AwarioSmartBot
+User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: cohere-ai
-User-agent: DataForSeoBot
+User-agent: Diffbot
User-agent: FacebookBot
User-agent: FriendlyCrawler
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GPTBot
-User-agent: ImagesiftBot
-User-agent: magpie-crawler
-User-agent: Meltwater
+User-agent: img2dataset
User-agent: omgili
User-agent: omgilibot
User-agent: peer39_crawler
User-agent: peer39_crawler/1.0
User-agent: PerplexityBot
+User-agent: YouBot
+Disallow: /
+
+# Marketing/SEO "intelligence" data scrapers
+User-agent: AwarioRssBot
+User-agent: AwarioSmartBot
+User-agent: DataForSeoBot
+User-agent: ImagesiftBot
+User-agent: magpie-crawler
+User-agent: Meltwater
User-agent: PiplBot
+User-agent: scoop.it
User-agent: Seekr
-User-agent: YouBot
Disallow: /
# Well-known.dev crawler. Indexes stuff under /.well-known.