Blast AI into the past: modders make Llama AI work on an old Windows 98 PC
Remember when you were young, you had far fewer responsibilities and you were still at least somewhat hopeful about the future potential of technology? Anyway! At the moment, nothing seems safe from the sticky fingers of so-called AI, including the nostalgic hardware of yesteryear.
Exo Labs, a company whose mission is to democratize access to artificial intelligence, such as large language models, has lifted the lid on its latest project: a modified version of Meta Llama 2 running on a Windows 98 Pentium II machine (via Hackaday). While it’s not Llama’s latest model, it’s no less stunning—even for me, a frequent AI skeptic.
To be fair, when it comes to the influence of big tech companies on artificial intelligence, Exo Labs and I seem to be equally wary of the issue. So, putting my own skepticism about artificial intelligence aside for now, this is undoubtedly an impressive project, mainly because it does not rely on an energy-hungry and very environmentally harmful intermediary data center to operate.
The journey to Lamu, powered by ancient, albeit indigenous, equipment takes some unexpected turns; After purchasing a used machine, Exo Labs had to struggle with finding compatible PS/2 peripherals and then figuring out how they would even transfer the necessary files to the decades-old machine. Did you know that FTP over Ethernet cable is backwards compatible to that extent? I certainly don’t!
But don’t be fooled – I’m saying it sounds a lot easier than it actually was. Even before the FTP problem could be figured out, Exo Labs had to find a way to compile modern code for a pre-Pentium Pro machine. In short, the team settled on Borland C++ 5.02, “a 26-year-old (integrated development environment) compiler that ran directly on Windows 98.” However, compatibility issues with the C++ programming language persisted, so the team had to use an older version of C and deal with variable declarations at the beginning of each function. Ugh.
Moreover, this project is based on hardware. For those looking for a refresher, the Pentium II machine comes with a small 128MB of RAM, while the full-size Llama 2 LLM boasts 70 billion parameters. Taking into account all these serious limitations, the results become even more interesting.
Unsurprisingly, Exo Labs had to create a relatively simplified version of Llama for this project, which can now be used independently via GitHub. As a result of all of the above, the upgraded LLM has 1 billion parameters and produces 0.0093 tokens per second – which isn’t that much, but the headline here is that it actually works at all.