summaryrefslogtreecommitdiff
path: root/internal/web/robots.go
AgeCommit message (Collapse)AuthorFiles
2025-02-05[feature] Use `X-Robots-Tag` headers to instruct scrapers/crawlers (#3737)Libravatar tobi1
* [feature] Use `X-Robots-Tag` headers to instruct scrapers/crawlers * use switch for RobotsHeaders
2025-02-04[feature] Change `instance-stats-randomize` to `instance-stats-mode` with ↵Libravatar tobi1
multiple options; implement nodeinfo 2.1 (#3734) * [feature] Change `instance-stats-randomize` to `instance-stats-mode` with multiple options; implement nodeinfo 2.1 * swaggalaggadingdong
2025-02-03[chore] disallow /nodeinfo/ too (#3729)Libravatar jade arson.1
2025-01-03[chore] Update robots.txt with more AI bots (#3634)Libravatar Daenney1
2024-09-07[chore] More AI blocking (#3273)Libravatar Daenney1
2024-08-29[chore] Update robots.txt with more AI scrapers (#3248)Libravatar Daenney1
2024-08-02[feature] Beef up our AI opt-outs (#3165)Libravatar Daenney1
* [chore] Synchronise our robots.txt with upstream * [feature] Add headers to escape AI crawlers This adds 2 headers that a number of AI crawlers respect to signal that content should not be included in their datasets.
2024-07-10[choore] Update robots.txt (#3092)Libravatar Daenney1
Recategorises a pair of scrapers according to their use.
2024-06-23[chore] Update our robots.txt (#3033)Libravatar Daenney1
This syncs our copy with the current state of the ai.robots.txt repository. Upstream has tightened their scope to be AI-only, whereas before it included a bunch of SEO and "web intelligence" marketing stuff. I've kept those but moved them into their own section.
2024-04-22[chore] Update robots.txt (#2856)Libravatar Daenney1
This updates the robots.txt based on the list of the ai.robots.txt repository. We can look at automating that at some point. It's worth pointing out that some robots, namely the ones by Bytedance, are known to ignore robots.txt entirely.
2024-04-11[feature] New user sign-up via web page (#2796)Libravatar tobi1
* [feature] User sign-up form and admin notifs * add chosen + filtered languages to migration * remove stray comment * chosen languages schmosen schmanguages * proper error on local account missing
2024-02-27[feature] Block Amazonbot (#2692)Libravatar Daenney1
Blocks the Amazon crawler bot. Closes: #2686
2023-12-27[chore] Refactor HTML templates and CSS (#2480)Libravatar tobi1
* [chore] Refactor HTML templates and CSS * eslint * ignore "Local" * rss tests * fiddle with OG just a tiny bit * dick around with polls a bit more so SR stops saying "clickable" * remove break * oh lord * don't lazy load avatar * fix ogmeta tests * clean up some cruft * catch remaining calls to c.HTML * fix error rendering + stack overflow in tag * allow templating attributes * fix indent * set aria-hidden on status complementary content, since it's already present in the label anyway * tidy up templating calls a little * try to make styling a bit more consistent + readable * fix up some remaining CSS issues * fix up reports
2023-09-30[feature] Block a bunch of "AI" crawlers (#2239)Libravatar Daenney1
* [feature] Block Google Bard/AI crawlers * [feature] Block the other OpenAI crawler * [feature] Block Common Crawl crawler This is used in research, but also gleefully advertises itself as the training source used in all LLMs and GPT-3. Fixes: #2240 * [feature] Block Omgilikebot Used by some shady big web data engine company. * [feature] Block Meta's language model crawler * [feature] Block well-known.dev crawler
2023-08-08[chore] Update robots.txt, give chatgpt the middle finger (#2085)Libravatar tobi1
2023-03-12[chore] Improve copyright header handling (#1608)Libravatar Daenney1
* [chore] Remove years from all license headers Years or year ranges aren't required in license headers. Many projects have removed them in recent years and it avoids a bit of yearly toil. In many cases our copyright claim was also a bit dodgy since we added the 2021-2023 header to files created after 2021 but you can't claim copyright into the past that way. * [chore] Add license header check This ensures a license header is always added to any new file. This avoids maintainers/reviewers needing to remember to check for and ask for it in case a contribution doesn't include it. * [chore] Add missing license headers * [chore] Further updates to license header * Use the more common // indentend comment format * Remove the hack we had for the linter now that we use the // format * Add SPDX license identifier
2023-01-25[feature] Public list of suspended domains (#1362)Libravatar f0x521
* basic rendered domain blocklist (unauthenticated!) * style basic domain block list * better formatting for domain blocklist * add opt-in config option for showing suspended domains * format/linter * re-use InstancePeersGet for web-accessible domain blocklist * reword explanation, border styling * always attach blocklist handler, update error message * domain blocklist error message grammar
2023-01-05[chore] Update/add license headers for 2023 (#1304)Libravatar tobi1
2023-01-02[chore] The Big Middleware and API Refactor (tm) (#1250)Libravatar tobi1
* interim commit: start refactoring middlewares into package under router * another interim commit, this is becoming a big job * another fucking massive interim commit * refactor bookmarks to new style * ambassador, wiz zeze commits you are spoiling uz * she compiles, we're getting there * we're just normal men; we're just innocent men * apiutil * whoopsie * i'm glad noone reads commit msgs haha :blob_sweat: * use that weirdo go-bytesize library for maxMultipartMemory * fix media module paths
2022-09-29[feature] Add `meta robots` tag; allow robots to index profile card if user ↵Libravatar tobi1
is Discoverable (#842) * rework robots.txt response * don't let robots snippet from statuses/threads * allow robots to index if user is Discoverable * add license text