[feature] proof of work scraper deterrence (#4043)

This adds a proof-of-work based scraper deterrence to GoToSocial's middleware stack on profile and status web pages. Heavily inspired by https://github.com/TecharoHQ/anubis, but massively stripped back for our own usecase. Todo: - ~~add configuration option so this is disabled by default~~ - ~~fix whatever weirdness is preventing this working with CSP (even in debug)~~ - ~~use our standard templating mechanism going through apiutil helper func~~ - ~~probably some absurdly small performance improvements to be made in pooling re-used hex encode / hash encode buffers~~ the web endpoints aren't as hot a path as API / ActivityPub, will leave as-is for now as it is already very minimal and well optimized - ~~verify the cryptographic assumptions re: using a portion of token as challenge data~~ this isn't a serious application of cryptography, if it turns out to be a problem we'll fix it, but it definitely should not be easily possible to guess a SHA256 hash from the first 1/4 of it even if mathematically it might make it a bit easier - ~~theme / make look nice??~~ - ~~add a spinner~~ - ~~add entry in example configuration~~ - ~~add documentation~~ Verification page originally based on https://github.com/LucienV1/powtect Co-authored-by: tobi <tobi.smethurst@protonmail.com> Reviewed-on: https://codeberg.org/superseriousbusiness/gotosocial/pulls/4043 Reviewed-by: tobi <tsmethurst@noreply.codeberg.org> Co-authored-by: kim <grufwub@gmail.com> Co-committed-by: kim <grufwub@gmail.com>
author: kim <grufwub@gmail.com> 2025-04-28 20:12:27 +0000
committer: kim <gruf@noreply.codeberg.org> 2025-04-28 20:12:27 +0000
commit: d8c4d9fc5a62741f0c4c2b692a3a94874714bbcc (patch)
tree: b64e5f1a635149db4b549fecd09437e9874572ad /docs/admin/robots.md
parent: [chore/docs] add symmetry to the politics (#4081) (diff)
download: gotosocial-d8c4d9fc5a62741f0c4c2b692a3a94874714bbcc.tar.xz
1 files changed, 1 insertions, 3 deletions
diff --git a/docs/admin/robots.md b/docs/admin/robots.md
index 3de4fe079..e4b3d27ce 100644
--- a/docs/admin/robots.md
+++ b/docs/admin/robots.md
@@ -10,8 +10,6 @@ You can allow or disallow crawlers from collecting stats about your instance fro
 
 The AI scrapers come from a [community maintained repository][airobots]. It's manually kept in sync for the time being. If you know of any missing robots, please send them a PR!
 
-A number of AI scrapers are known to ignore entries in `robots.txt` even if it explicitly matches their User-Agent. This means the `robots.txt` file is not a foolproof way of ensuring AI scrapers don't grab your content.
-    
-If you want to block these things fully, you'll need to block based on the User-Agent header in a reverse proxy until GoToSocial can filter requests by User-Agent header.
+A number of AI scrapers are known to ignore entries in `robots.txt` even if it explicitly matches their User-Agent. This means the `robots.txt` file is not a foolproof way of ensuring AI scrapers don't grab your content. In addition to this you might want to look into blocking User-Agents via [requester header filtering](request_filtering_modes.md), and enabling a proof-of-work [scraper deterrence](scraper_deterrence.md).
 
 [airobots]: https://github.com/ai-robots-txt/ai.robots.txt/
author	kim <grufwub@gmail.com>	2025-04-28 20:12:27 +0000
committer	kim <gruf@noreply.codeberg.org>	2025-04-28 20:12:27 +0000
commit	d8c4d9fc5a62741f0c4c2b692a3a94874714bbcc (patch)
tree	b64e5f1a635149db4b549fecd09437e9874572ad /docs/admin/robots.md
parent	[chore/docs] add symmetry to the politics (#4081) (diff)
download	gotosocial-d8c4d9fc5a62741f0c4c2b692a3a94874714bbcc.tar.xz