It seems like some FOSS websites started using a Proof of Work CAPTCHA called Anubis because they’ve been getting hit by crawlers to gather data for LLMs. It seems like it helps.

It does not stop them, but it does make it more expensive and slower for the attacker. At the moment, I haven’t seen any instance having this problem, but it most likely will be a problem someday and being praped for it is definitely not a bad thing.

Lemmy could benefit from this by maybe placing some invisible or auto PoW CAPTCHAs when doing some action like commenting, posting, etc.

  • poVoq@slrpnk.net
    link
    fedilink
    arrow-up
    6
    ·
    20 hours ago

    It does not stop them, but it does make it more expensive and slower for the attacker.

    This is a bit of a misconception of what Anubis does. It uses PoW to enforce a full browser environment, but the PoW is only used once a week or so (or when there is some suspicious things detected). The PoW is then used to autogenerate a kind of password to store in the browser cookies, and to generate this “password” you can’t use the simple servers that are used at scale to scrape (practically ddos) the open internet right now.

    The main problem is with complex websites like git forges that these AI scrapers hit all the computational expensive deep endpoints and practically force them to shut down from overloading the CPU.

    Since I was forced to implement Anubis for my Forgejo instance I also experimented with it on Lemmy. Right now the results show that while Lemmy isn’t as badly effected by this AI scraping, there is still quite a bit of it happening. After adding Anubis the overall traffic went down by about a third on our instance, and it prevents the regular traffic spikes we previously saw and had no real explanation for.

    But we also ran in some strange issues with it. Most likely it is caused by Anubis detecting mobile connections with switching IP addresses as possible scrapers (who are known to first access pages from a more complete server to get cookies and so on and then switch to a cheaper server on a different IP to do the actual scraping). But we are still figuring out how to replicate those issues, and they might have been fixed in the latest Anubis update we applied yesterday.

  • Nothing4You@programming.dev
    link
    fedilink
    English
    arrow-up
    4
    ·
    20 hours ago

    slrpnk.net has some first hand experience for this, as @poVoq@slrpnk.net already deployed anubis in front of lemmy-ui.

    it wouldn’t be that complicated to add it to lemmy-ansible if people are interested in having the option.

    i don’t see the argument for having this before user interaction though; the main goal of this is to fight malicious crawlers. for authenticated users, solutions like this are completely unnecessary as these can simply and much more efficiently be addressed through rate limits without putting users on low end hardware at a disadvantage and contributing to global warming.

    • poVoq@slrpnk.net
      link
      fedilink
      arrow-up
      3
      ·
      20 hours ago

      Yes and so far only minor issues that are hard to replicate. Thanks again for helping us to find out the final issue with the setup a few weeks ago.

      I agree that it would make more sense to only enable it for unauthenticated visitors, but that seems a bit hard to do with an external software like Anubis.

      • Nothing4You@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        20 hours ago

        I didn’t mean only showing Anubis to unauthenticated users; this was in response to OP mentioning to add this before posting or commenting, which would be the opposite of removing it for authenticated users.

        • poVoq@slrpnk.net
          link
          fedilink
          arrow-up
          1
          ·
          19 hours ago

          Ah, ok. Yes that kinda makes sense if you think of Anubis as a CAPTCHA equivalent, but it really isn’t as I tried to explain in my other post.

  • drspod@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    20 hours ago

    I frequently access Lemmy through quite old hardware, and I’d be a bit worried that these PoW scripts would make the site unusably slow for me.

    • poVoq@slrpnk.net
      link
      fedilink
      arrow-up
      2
      ·
      19 hours ago

      You might have to sit through a slightly longer waiting time every now and then, but Anubis is not invoked on every connection and once your browser is found to be worthy you can surf as before.

      The bigger issue might be if that old hardware can’t run a modern up to date browser, because then it doesn’t work at all, which is the real down-side of Anubis.

      I tried it with the default settings of the Tor browser though and that worked ok surprisingly.