• earthworm@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    9
    ·
    3 days ago

    Is there any reason someone can’t scrape the files and then make them more easily accessible/searchable?

    Is it illegal or something?

    • adhd_traco@piefed.social
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      3 days ago

      I don’t see how it would be illegal at all. I think it would take a lot of time, since there are supposedly hundreds of thousands of documents released and a lot of it is handwritten reports, entries, etc. That’s not to mention any audio files that might have been released too. But they clearly didn’t even try. I should get results for “Barak” (former Israeli PM, alleged to be the sadistic pedophile rapist mentioned in Giuffre’s book)

      EDIT: Apparently, Trump’s DOJ lied again. Instead of the “several hundred thousands pages” they advertised today, it’s just under 4k.

      CBS:

      New documents span 3,965 files, totaling 3 GB of data

      The total number of files across all four new data sets is 3,965, with a total file size of about 3 GB. Nearly all of the files are PDFs, with one video file. Some of the files are individual images, while others are documents with many pages.

      • JC1@lemmy.ca
        link
        fedilink
        arrow-up
        12
        ·
        3 days ago

        I mean… The pro LLM people must surely have a tool to do OCR and analyse natural language of documents… I personally don’t trust these that much, but they surely do much more than me.