I have downloaded a 2016 ‘dump’ of the marxists.org website. I was just wondering if anyone wants to help me download all the pdf/epubs that are missing and getting all the pdf/epubs out of the dump (as it is not well organized) and put it all in neatly organized folders and into a .torrent for anyone to download. If anyone wants to help me with this endavour or any tips on ways to do it i will greatly apreciate it. i am broke asf i cant pay for wage slaves sorry

The reason why i want to do this is to make it easy and accesible for anyone to download any marxist works, as even though i do like the website i think there should be a way for anyone to just download everything.

please dont take this too harshly if i missed something or said something wrong, it is my first post.

    • klepti@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      6
      ·
      22 days ago

      well i find out i can just do a new dump of the website but its taking a really long while and if it was 700gb+ in 2021 i dont think its gonna fit in my 1tb drive… im using wget which is supported by the website and it works well, but yea not gonna fit so idrk what to do, i may just not download some things/languages ig

      • haui@lemmygrad.ml
        link
        fedilink
        arrow-up
        7
        ·
        22 days ago

        I have around 8 tb of space i can use to download stuff but we need to come up with a strategy how to pack and redistribute it imho. I’m on the matrix server if you want to dm me. Would like to help.

  • knfrmity@lemmygrad.ml
    link
    fedilink
    English
    arrow-up
    6
    ·
    22 days ago

    I tried something like this before. Started with a curl command or script which would follow every internal link in a page to recursively download the websjte. That was annoying cause there can be a lot of extra elements you don’t really need. Then I tried ArchiveBox which was a bit op for my purposes but may work well here.

    • klepti@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      4
      ·
      22 days ago

      im currently trying out wget, officially supported so thats good, but struggling a bit with space n stuff, will probably end up just do a selective download

      • haui@lemmygrad.ml
        link
        fedilink
        arrow-up
        2
        ·
        22 days ago

        Whats the problem with space? Do you need patterns for the addresses? I’m possibly in a situation to help. Let me know what you have tried so far. I also am on the matrix server if you want to dm me.