Has anyone tried in organization to use self hosted llm models for agentic programming?

Im curious if it makes any sense. My organization spends fortune on tokens from US companies. I want to recommend something… I think that will be cheaper to use it on own machines instead…

  • [object Object]@lemmy.ca
    link
    fedilink
    English
    arrow-up
    6
    ·
    5 days ago

    So self hosting is still not great.

    The big problem is you can get large memory but slow prompt processing, which reduces your context window, or you can get semi-fast GPU with low memory, where you’re capped on models.

    Sometimes I run pi agent in a container with Gemma 4 or Qwen 3.6, but even on strix halo after 60k tokens the quadratic slowdown is brutal.

    We aren’t there yet for complex agaentic workflows locally, and it’s primarily a hardware issue.

    Though innovations in performance are being shipped regularly, they’re incremental.

  • Warl0k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    5 days ago

    There are a few of the chinese open models that are okay for coding, but in terms of functionality they’re extremely basic. You can make them work, but if people are used to the big corpo models it’s going to be hard to get them to switch to what is basically a chatbot, and the open-source tools to give them much needed QoL functionality are pretty rough right now.

    For sure worth looking into self hosting but it’s going to take quite a bit of convincing to get people to shift over, I fear.

  • gravitas_deficiency@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 days ago

    There’s a ton of content out there about locally hosting LLMs and ML models in general, and a number of newer novel techniques and approaches to successfully running models that are rather a lot bigger than your VRAM. I’d start by searching around for that stuff.

  • MystValkyrie@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 days ago

    Is 128 GB of ram per unit enough for your organization’s use case? You could convince them to buy a Framework Desktop and then install an offline llm to it (ollama with Mistral, perhaps). Then you don’t have to rely on American companies or the environmental impact of data centers, and then after the startup cost, it’s free from then on.

    Best of all, they can just be normal work computers when the bubble bursts.

    I wish I could just say, “Convince your company not to use AI,” but I’m sure your higher-ups aren’t taking no for an answer.

  • lukecyca@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 days ago

    Pi.dev with Qwen3.6 running on a modest 6GB GPU is actually working pretty well for me. For smallish well-scoped agentic code tasks.

  • Noxy@pawb.social
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    4 days ago

    self hosting an orphan crusher doesn’t sound like a meaningful improvement