cross-posted from: https://lemmy.dbzer0.com/post/41844010

The problem is simple: consumer motherboards don’t have that many PCIe slots, and consumer CPUs don’t have enough lanes to run 3+ GPUs at full PCIe gen 3 or gen 4 speeds.

My idea was to buy 3-4 computers for cheap, slot a GPU into each of them and use 4 of them in tandem. I imagine this will require some sort of agent running on each node which will be connected through a 10Gbe network. I can get a 10Gbe network running for this project.

Does Ollama or any other local AI project support this? Getting a server motherboard with CPU is going to get expensive very quickly, but this would be a great alternative.

Thanks

  • ThreeJawedChuck@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 month ago

    Coincidentally, I have just been trying this using the llama.cpp server and two machines on my local LAN.

    I made a post about it in https://sh.itjust.works/c/localllama. I’m brand new to lemmy (literal hours) so I’ll probably do this all wrong, but maybe this is a link to my post? https://sh.itjust.works/post/39137051 I’m a little confused about posting links in this federated system, but I hope that works. The upshot is that I got it working fine across two machines, and it was easy to set up, but it has a few minor (to me) drawbacks.