Hello,
I have been looking into a new laptop, and was coming across ones with these NPUs heavily advertised in them. Doing some reading, they don’t seem extremely functional at this stage.
They are around 45-50 TOPS at the highest it seems. I found some articles and comments suggesting that ‘could’ be useful for locally using smaller models, but also statements conflicting with that. As well, most, if not all, ‘technical use’ of them seem locked into the Windows environment. Even some program from AMD allowing local LLM use requires a Windows Server for it to communicate with, iirc. (AMD, GAIA)
So, is there, currently, any technical use for these, such that it makes much sense to grab a device with one for tinkering?
I’d considered experimenting with smaller models and seeing what comes of those (if small model improvements come through as DeepSeek proponents might suggest).
I’m also just generally new to the technology, but intrigued by the potential to localize usage; not only because of the potential to limit the environmental impact of large data center use.
Any comments, ideas, suggestions, or general pointing in a direction is very appreciated.
Thank you for taking the time. Have a good day!


you can run local llms, but npus are not that well established in this space. i currently run stuff using my igpu.
as a starter, whatever npu you come across, search
"insert_npu_name" llama cpp support. here llama cpp is one of the best and well established community way to run llms locally. here is a github issue, with links for support of intel/amd/qualcomm npus - https://github.com/ggml-org/llama.cpp/issues/9181for practical purposes, you can run models smaller than 8B. (max stretch to 24B or close, but performance gets really bad). (this is considering a standardish laptop) (if you do not know what 8B here is, do not worry, it is size of model).
without even buying a new laptop, i suggest you to get llama.cpp (it is easy to get started on all oses, there are releases available). you would need some models, which you will find on hugging face (imagine github, but for models)(github is appstore for code). what you have to look for are quantised models (you do not have to worry about what quantisation is right now). these are files ending with
.ggufin name. for now, for example, we start with https://huggingface.co/unsloth/SmolLM3-3B-128K-GGUF/blob/main/SmolLM3-3B-128K-Q4_K_M.gguf (smollm3, a model i like by huggingface team, based on open data). just dowbload the file i linked.after you install llama.cpp, there will be some command llama-server available, then we will run
llama-server -m "path to gguf file you downloaded"and you will get a link (a local host link) which you can open in a browser, and use it as you would use chatgpt (it just would be a bit stupider and slower).there are many guides available online, so maybe try reading how to run llama cpp, various parameters, how to use gpu, etc. if all this is unclear, feel free to message me (here or on matrix(id in my profile page)). If needed, I can hop on a meeting as well to get it running for you.
Thank you very much for all of this. It makes it seem very possible to at least experiment with these LLMs.
Unfortunately it’s not likely I’ll be able to take your advice.
I’ve been looking into new laptops in case I can find a job in tech again, but I don’t have the funds to buy anything. This is the system I have. Though, some of its hardware is failing I think. The battery failed years ago, so it must stay plugged in all the time. Thankfully it turns on even if I remove the battery so it’s not an overcharging concern.
Sorry I couldn’t try your advice yet.
I’ll come back to it if I can.
Thank you again!
as i said, you do not even need new hardware. for tiny models, your 5 year old (or whatever) cpu is fine.
I do not mean to press you, just maybe try it once. I for one do not find much use for them unless i have to send something to some superior for some proposal, and i just need grammatical proof read or tonality check.
I just advocate for local use because the less you rely on big tech, the better. they harvest your data, and also take money from you to use your data. they ask you to pay even more if you do not want them to use it and that just sounds like mafia behaviour to me.