Intel NPU Acceleration Library: Running Lightweight LLMs Like TinyLlama on Meteor Lake AI PCs

2024-03-03 00:13:47

LLM on the move

Intel has released an open source NPU acceleration library to enable Meteor Lake AI PCs to run lightweight LLMs like TinyLlama.

Although the library is primarily aimed at developers, regular users with coding experience can also use it to run AI chatbots on Meteor Lake.

The library is currently live on GitHub, and Intel was supposed to write a blog post about the NPU acceleration library, but Intel Software Architect Tony Mongkolsmai quickly published it on X.

He showed a demo of the software running TinyLlama 1.1B Chat on an MSI Prestige 16 AI Evo laptop with a Meteor Lake CPU and asked questions about the pros and cons of smartphones and flip phones.

The library works on both Windows and Linux.

Of course, the NPU acceleration library is aimed at developers, not general users, so using it for your own purposes is not an easy task.

Mr. Mongkolsmai shared the code he wrote to run the chatbot, but if you want to run the same on your own PC, you need to understand Python properly or use the lines shared in the image above. I guess I’ll just have to retype everything and hope it works on my PC.

The NPU acceleration library is explicitly made for NPUs, which means it can only run Meteor Lake at this time.

Arrow Lake and Lunar Lake CPUs, expected later this year, should widen the field of compatible CPUs.

These upcoming CPUs will likely deliver 3x the AI ​​performance of Meteor Lake and be able to run even larger LLMs on laptop and desktop silicon.

The library is not yet fully functional, with less than half of its planned functionality shipped.

Most notably, it lacks mixed-precision inference that can be performed on the NPU itself, BFloat16 (a data format commonly used in AI-related workloads), and NPU-GPU heterogeneous computation.

Since the NPU acceleration library is brand new, it is unclear how much of an impact it will have, but we hope that it will lead to new AI software for AI PCs.

sauce:Tom’s Hardware – Intel’s NPU Acceleration Library goes open source — Meteor Lake CPUs can now run TinyLlama and other lightweight LLMs

Explanation:

Intel releases NPU library, allowing lightweight LLM to operate

I think it’s wonderful.

It’s very much like a beta version at the moment, but I think it will be fully usable in 2-3 years.

I’m sure some people will criticize you for saying, “You’re useless,” but I think it’s only 2-3 years before it becomes practical, so I don’t think it’s useful at this point.

I haven’t changed my opinion that it’s useless at this point, but I’m not saying it’s completely meaningless.

However, I have to ask if it is useful at the general public level.

Let me state that clearly.

Even the scripts I distribute for general users are considered “troublesome”, so they are completely useless (bitter smile).

It would be impossible to tell ordinary people who can’t write even the slightest script by themselves to use it just because you’ve prepared a library.

I think it will take at least another 2-3 years for it to be incorporated into products that can be used by the general public.

If that’s the case, ChatGPT can already be used over the internet, so I don’t think it would have any impact on the general public even if it worked locally. (bitter smile.

What is the use of running LLM and image generation AI locally? I don’t really understand that.

If you need a function, you just need to be able to connect to the internet and access the API over the internet, and I don’t really feel the need to force the function into the local PC.

I’m not affiliated with any political party, but when I see stories like this, I want to say, “Isn’t it bad for cloud computing?”…


1709428484
#Intels #NPU #acceleration #library #open #source #Meteor #Lake #CPUs #run #TinyLlama #lightweight #LLMs

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.