Has anyone tried in organization to use self hosted llm models for agentic programming?
Im curious if it makes any sense. My organization spends fortune on tokens from US companies. I want to recommend something… I think that will be cheaper to use it on own machines instead…
I just think for myself
Mistral is based in France
What unique advantage and service does Mistral offer?
France isn’t in the US.
Which is great. I support France and Mistral. Is there anything else?
Does there need to be…?
being in france
Nice, I use Mistral
I had been using it for almost a year but it’s really dumb compared to the big three US llms. I had to unsubscribe since “it’s not US” alone didn’t justify the fees.
mistral?
So self hosting is still not great.
The big problem is you can get large memory but slow prompt processing, which reduces your context window, or you can get semi-fast GPU with low memory, where you’re capped on models.
Sometimes I run pi agent in a container with Gemma 4 or Qwen 3.6, but even on strix halo after 60k tokens the quadratic slowdown is brutal.
We aren’t there yet for complex agaentic workflows locally, and it’s primarily a hardware issue.
Though innovations in performance are being shipped regularly, they’re incremental.
There are a few of the chinese open models that are okay for coding, but in terms of functionality they’re extremely basic. You can make them work, but if people are used to the big corpo models it’s going to be hard to get them to switch to what is basically a chatbot, and the open-source tools to give them much needed QoL functionality are pretty rough right now.
For sure worth looking into self hosting but it’s going to take quite a bit of convincing to get people to shift over, I fear.
There’s a ton of content out there about locally hosting LLMs and ML models in general, and a number of newer novel techniques and approaches to successfully running models that are rather a lot bigger than your VRAM. I’d start by searching around for that stuff.
Is 128 GB of ram per unit enough for your organization’s use case? You could convince them to buy a Framework Desktop and then install an offline llm to it (ollama with Mistral, perhaps). Then you don’t have to rely on American companies or the environmental impact of data centers, and then after the startup cost, it’s free from then on.
Best of all, they can just be normal work computers when the bubble bursts.
I wish I could just say, “Convince your company not to use AI,” but I’m sure your higher-ups aren’t taking no for an answer.
Pi.dev with Qwen3.6 running on a modest 6GB GPU is actually working pretty well for me. For smallish well-scoped agentic code tasks.
Something like AIhorde as the foundation ?
self hosting an orphan crusher doesn’t sound like a meaningful improvement




