diff options
| author | 2024-06-23 15:34:21 +0200 | |
|---|---|---|
| committer | 2024-06-23 15:34:21 +0200 | |
| commit | 4604224c4d4d286d4f4e860e3c7fe7c61a4a2452 (patch) | |
| tree | a2888e451b92eb25c11efc257f9603d093c92c6f | |
| parent | [bugfix] add Date and Message-ID headers for email (#3031) (diff) | |
| download | gotosocial-4604224c4d4d286d4f4e860e3c7fe7c61a4a2452.tar.xz | |
[chore] Update our robots.txt (#3033)
This syncs our copy with the current state of the ai.robots.txt
repository. Upstream has tightened their scope to be AI-only, whereas
before it included a bunch of SEO and "web intelligence" marketing
stuff. I've kept those but moved them into their own section.
| -rw-r--r-- | internal/web/robots.go | 22 | 
1 files changed, 14 insertions, 8 deletions
diff --git a/internal/web/robots.go b/internal/web/robots.go index 58b541413..9ecf58182 100644 --- a/internal/web/robots.go +++ b/internal/web/robots.go @@ -34,32 +34,38 @@ const (  User-agent: AdsBot-Google  User-agent: Amazonbot  User-agent: anthropic-ai -User-agent: Applebot -User-agent: AwarioRssBot -User-agent: AwarioSmartBot +User-agent: Applebot-Extended  User-agent: Bytespider  User-agent: CCBot  User-agent: ChatGPT-User  User-agent: ClaudeBot  User-agent: Claude-Web  User-agent: cohere-ai -User-agent: DataForSeoBot +User-agent: Diffbot  User-agent: FacebookBot  User-agent: FriendlyCrawler  User-agent: Google-Extended  User-agent: GoogleOther  User-agent: GPTBot -User-agent: ImagesiftBot -User-agent: magpie-crawler -User-agent: Meltwater +User-agent: img2dataset  User-agent: omgili  User-agent: omgilibot  User-agent: peer39_crawler  User-agent: peer39_crawler/1.0  User-agent: PerplexityBot +User-agent: YouBot +Disallow: / + +# Marketing/SEO "intelligence" data scrapers +User-agent: AwarioRssBot +User-agent: AwarioSmartBot +User-agent: DataForSeoBot +User-agent: ImagesiftBot +User-agent: magpie-crawler +User-agent: Meltwater  User-agent: PiplBot +User-agent: scoop.it  User-agent: Seekr -User-agent: YouBot  Disallow: /  # Well-known.dev crawler. Indexes stuff under /.well-known.  | 
