Adequate for running LLMs, surprisingly, a 1997 Pentium II CPU is the standard now instead of expensive Nvidia GPUs.
AI on a Time Machine:
You might think you need a small fortune on high-end Nvidia GPUs to run the latest AI models. But hold your horses! EXO Labs (via Indian Defence Review) has managed to get Llama 2 LLM up and running on a relic of the past - a Windows 98 box with a humble Pentium II processor [1]!
Imagine drafting a 1997 machine for a task that requires brute computational power! That's what EXO Labs did, and received a whooping 20,000 times slower performance compared to a modern GPU [1]. The cost? Just under $120, snagged from eBay. The real challenge wasn't the price tag, but getting the peripherals to function, what with the legacy PS2 ports and a faint scream for a USB input [1].
Transferring the required files onto the machine was a headache in itself. Then there was compiling the files into a compatible format for the ancient Pentium II instruction set [1]. But with the code and hardware sorted, it was time to unleash Llama 2 [1].
The 260K parameter version of the model achieved 39.31 tokens per second, while the bigger 15M parameter version managed just 1.03 tokens per second on the Pentium II [1]. They even tried a partial data model run using a one billion parameter version of Llama 3.2, resulting in a glacial 0.0093 tokens per second [1]. In context, the one billion parameter 3.2 model hits 40 tokens per second on Arm CPUs and a staggering 200 tokens per second on GPUs [1].
Despite the excitement of getting a modern LLM up and running on such an antiquated CPU, the performance gap is a stark reminder of the importance of speed [1]. It's like trying to play the latest 3D games on a Pentium II - you could do it, but the results would be a far cry from what you'd get with a modern GPU [2]. At this frame rate, it's all a bit pointless.
Maybe it would be fun to watch the pixels being rendered, one-by-one. But completing a benchmark run could take years. So, for now, let's leave all that in the past and embrace the power of modern GPUs for AI [1].
Game On, Gamers
Stay updated with the latest gaming news, reviews, and hardware deals, curated by the PC Gamer team.
The CPU You Need for Gaming
Discover the top chips from Intel and AMD.
Your Perfect Pixel Pusher
Find the right motherboard for your gaming needs.
Upgrade Your Graphics
Explore the best graphics cards to elevate your gaming experience.
Get into the Game First
Unlock gaming speed with the best SSDs for your system.
Jeremy LairdJeremy has been writing about tech and PCs since the Netburst era (Google it!) and loves dissecting the technicalities of monitor input lag and overshoot. He also has a fascination with advanced lithography and machines that make delightful "ping" noises. When he's not geeking out, Jeremy indulges in his passion for tennis and cars.
Enrichment Insights:- Massive difference in performance between a Pentium II processor and modern GPUs, primarily due to the architectural improvements in modern GPUs built for AI workloads- Large-scale models like Llama 2 require powerful GPUs (such as RTX 4090 or RTX 5090) with ample VRAM and tensor cores to run efficiently on billions of parameters- Smaller models can run efficiently on single high-end consumer GPUs (e.g., Nvidia RTX 4090)- A Pentium II processor cannot effectively run large-scale LLMs, with performance being orders of magnitude slower than modern GPUs- Modern GPU acceleration is essential for practical LLM inference today.
[1] Data derived from EXO Labs' experimentation on Llama 2 and research provided by Indian Defence Review[2] Comparison of running Cyberpunk 2077 on a Pentium II and its performance to that of the PII on Llama 2 is a metaphorical representation. The actual results would vary greatly.
- AI models like Llama 2 LLM might be thought to require expensive, high-end Nvidia GPUs, but a Pentium II processor, an antiquated CPU, managed to run it with a whooping 20,000 times slower performance compared to a modern GPU, bought for just under $120 from eBay.
- The challenge wasn't just the price tag, but getting the legacy PS2 ports and faint demand for a USB input to function on the 1997 machine.
- Transferring the required files and compiling them into a compatible format for the ancient Pentium II instruction set was a headache.
- In context, a one billion parameter 3.2 model hits 40 tokens per second on Arm CPUs and a staggering 200 tokens per second on GPUs, while the bigger 15M parameter version of Llama 2 managed just 1.03 tokens per second on the Pentium II.
- It's like trying to play the latest 3D games on a Pentium II - the results would be a far cry from what you'd get with a modern GPU.
- Maybe it would be fun to watch the pixels being rendered, one-by-one, but completing a benchmark run could take years, so for now, it's best to leave such experiments in the past and embrace the power of modern GPUs for AI.
- Stay updated with the latest in gaming news, reviews, and hardware deals, and remember, modern GPUs are essential for practical AI inference today, much like they are for gaming on smartphones, gadgets, and advanced technology.


