NPUs, functional 'tech' uses?

vimmiewimmie@slrpnk.net · 5 months ago

NPUs, functional 'tech' uses?

sga@piefed.social · 5 months ago

you can run local llms, but npus are not that well established in this space. i currently run stuff using my igpu.

as a starter, whatever npu you come across, search "insert_npu_name" llama cpp support. here llama cpp is one of the best and well established community way to run llms locally. here is a github issue, with links for support of intel/amd/qualcomm npus - https://github.com/ggml-org/llama.cpp/issues/9181

for practical purposes, you can run models smaller than 8B. (max stretch to 24B or close, but performance gets really bad). (this is considering a standardish laptop) (if you do not know what 8B here is, do not worry, it is size of model).

without even buying a new laptop, i suggest you to get llama.cpp (it is easy to get started on all oses, there are releases available). you would need some models, which you will find on hugging face (imagine github, but for models)(github is appstore for code). what you have to look for are quantised models (you do not have to worry about what quantisation is right now). these are files ending with .gguf in name. for now, for example, we start with https://huggingface.co/unsloth/SmolLM3-3B-128K-GGUF/blob/main/SmolLM3-3B-128K-Q4_K_M.gguf (smollm3, a model i like by huggingface team, based on open data). just dowbload the file i linked.

after you install llama.cpp, there will be some command llama-server available, then we will run llama-server -m "path to gguf file you downloaded" and you will get a link (a local host link) which you can open in a browser, and use it as you would use chatgpt (it just would be a bit stupider and slower).

there are many guides available online, so maybe try reading how to run llama cpp, various parameters, how to use gpu, etc. if all this is unclear, feel free to message me (here or on matrix(id in my profile page)). If needed, I can hop on a meeting as well to get it running for you.

vimmiewimmie@slrpnk.net · 4 months ago

Thank you very much for all of this. It makes it seem very possible to at least experiment with these LLMs.

Unfortunately it’s not likely I’ll be able to take your advice.

I’ve been looking into new laptops in case I can find a job in tech again, but I don’t have the funds to buy anything. This is the system I have. Though, some of its hardware is failing I think. The battery failed years ago, so it must stay plugged in all the time. Thankfully it turns on even if I remove the battery so it’s not an overcharging concern.

Sorry I couldn’t try your advice yet.

I’ll come back to it if I can.

Thank you again!

sga@piefed.social · 4 months ago

as i said, you do not even need new hardware. for tiny models, your 5 year old (or whatever) cpu is fine.

I do not mean to press you, just maybe try it once. I for one do not find much use for them unless i have to send something to some superior for some proposal, and i just need grammatical proof read or tonality check.

I just advocate for local use because the less you rely on big tech, the better. they harvest your data, and also take money from you to use your data. they ask you to pay even more if you do not want them to use it and that just sounds like mafia behaviour to me.