Donostia, Spain – April 8, 2025 – Multiverse Computing as we speak launched two new AI fashions compressed by CompactifAI, Multiverse’s AI compressor: 80 % compressed variations of Llama 3.1-8B and Llama 3.3-70B.
Each fashions have 60 % fewer parameters than the unique fashions, 84 % higher vitality effi ciency, 40 % sooner inference, and yield a 50 % price discount with out sacrifi cing accuracy, in response to Multiverse. “AI builders can instantly plug the fashions into any software – edge, on-premise, or cloud,” the corporate mentioned.
Multiverse will launch variations of the highest LLMs compressed by CompactifAI over the approaching months.
“Meta’s Llama 4 launch underscores a significant shift in AI: smaller, extra highly effective, and multimodal fashions are now not optionally available — they’re the brand new default,” mentioned Dmitry Zakharchenko, chief software program workplace at Blaize, a U.S. edge AI chip firm. “As AI strikes from cloud to edge, success will depend on fashions which are environment friendly, reasonably priced, and absolutely programmable.”
Multiverse mentioned CompactifAI is the fi rst compressor of its type, utilizing quantum-inspired tensor networks to make AI methods extra effi cient and moveable, decreasing dimension as much as 93 % with solely a 2-3 % drop in accuracy—an astounding feat when in comparison with an industry-standard 20-30% accuracy loss with 50-60 % compression methods.
“CompactifAI is altering the economics of AI processing and opening up new use circumstances for AI fashions,” mentioned Enrique Lizaso Olmos, CEO of Multiverse Computing. “Eff orts to curb unwieldy fashions have come up quick. Our novel method to compression grounded in quantum-inspired methods makes it attainable to pair efficiency with processing effi ciency and provides us a large edge on LLM suppliers.”
Multiverse Computing was based in 2019 by pioneers in quantum-inspired software program to develop novel options to advanced enterprise issues. In 2023 the corporate started making use of its core know-how to deal with the AI vitality disaster with CompactifAI.
LLM suppliers have turned to methods reminiscent of pruning and quantization to compress fashions however have but to eradicate the tradeoff between dimension and efficiency. As an illustration, Llama3.1-8B Slim by CompactifAI requires 300x fewer coaching tokens than Meta’s CAI Llama3, and 3x fewer coaching tokens than Nvidia’s Llama3.1-Minitron whereas outperforming throughout benchmarks. For Llama3.3-70B Slim by CompactifAI, comparative benchmarks present a rise in reasoning capabilities whereas sustaining authentic precision.
“We’re quickly delivering compressed variations of essentially the most highly effective LLMs on this planet,” mentioned Sam Mugel, Chief Know-how Offi cer at Multiverse. “The superior capabilities of those two huge fashions can now fi t into smartphones, laptops, and automobiles, or real-world machines like oil rigs and satellites. Our aggressive roadmap to roll out dozens of compressed, main LLMs may dramatically speed up the influence of AI in the true world.”