Title.

I’ve noticed that the issues above are becoming increasingly notorious across the entirety of the Fediverse. What’s being done to mititage those issues?

  • Rimu@piefed.social
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 hours ago

    Scrapers are not federating.

    Activitypub could be used to harvest content on a ongoing basis but to get all the historical data, which is the stuff they want, they can’t use activitypub. Lemmy only has the last 50 posts in each community’s outbox.

    • CombatWombat@feddit.online
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      1 hour ago

      I feel pretty confident, despite a complete lack of evidence, that at least one state actor has had a listener running on the fediverse continuously since the w3c started publishing specs, and I would be surprised if the big llm providers like Anthropic and OpenAI don’t run them as well – they certainly have the resources and motivation to develop them. You’re certainly correct that the vast majority of scrapers are attempting to harvest historical data using the web frontend, but those are the scrapers I am least afraid of and I think as a mental model for the average user “assume every post is scraped” is the best stance.