• The Midas Report
  • Posts
  • Small Models, Big Ambition as Multiverse Puts AI on the Edge

Small Models, Big Ambition as Multiverse Puts AI on the Edge

4 min read

Two newly released ultra small models, SuperFly and ChickBrain, aim to cram high performance into tiny packages for phones, PCs, and even microcontrollers, with an AWS API that often charges lower token fees.

Multiverse Computing is making a direct play for on device AI with two compact models designed for smartphones, PCs, and IoT hardware. The company introduced SuperFly and ChickBrain, calling them the world’s smallest models that still deliver high performance for chat and speech, and in one case reasoning. The pitch is not about topping leaderboards against giant systems. It is about shrinking models so much that they fit where developers want them to run. “We can compress the model so much that they can fit on devices,” said founder Román Orús, adding, “You can run them on premises, directly on your iPhone, or on your Apple Watch.”

The technology behind that claim is CompactifAI, a quantum inspired compression algorithm that reduces the size of existing models. Orús describes it as different from typical computer science or machine learning approaches, calling it “a more subtle and more refined compression algorithm.” SuperFly is a compressed version of Hugging Face’s SmolLM2 135 brought down to 94 million parameters from 135 million.

Orús likened it to “having a fly, but a little bit more clever.” ChickBrain compresses Meta’s Llama 3.1 8B to 3.2 billion parameters and is described by the company as slightly outperforming the original in internal tests. In practice, that means Multiverse is targeting performance without a size penalty rather than chasing absolute state of the art metrics.

Running Locally and Reducing Costs

The deployment story is as important as the compression. Multiverse says these models can be embedded in IoT devices and run locally on smartphones, tablets, and PCs, including offline operation. The company demonstrated SuperFly handling a voice interface with modest processing power such as an Arduino, and says ChickBrain can run on a MacBook with no internet connection. For teams that still want hosted access, the models are available through an API on AWS, which the company says often comes with lower token fees than competitors.

There is business momentum behind the engineering. Multiverse raised €189 million in June, bringing total funding since 2019 to about $250 million, and has roughly 100 employees. The latest round was led by Bullhound Capital with HP Tech Ventures and Toshiba participating. On the go to market side, the company says it is talking with Apple, Samsung, Sony, and HP, and it counts BASF, Ally, Moody’s, and Bosch among customers for its compression technology beyond language models, including image recognition. The models live inside a Model Zoo product family named after animal brain sizes, a branding choice that lines up with the focus on shrinking footprints while preserving useful behavior.

A Shift Toward Smaller but Capable Models

This push does not happen in isolation. Multiverse cited a wave of compressed releases across the ecosystem, including Llama 4 Scout, Mistral Small 3.1, two new open models from OpenAI, and DeepSeek R1 Slim. The signal is that compact models are gaining attention, whether to serve edge devices, reduce latency, or offer a different cost profile than heavyweight systems. If a 3.2 billion parameter model can meet many tasks where an 8 billion parameter model would have been used, and if a 94 million parameter system can power a voice interface on simple hardware, developers have more deployment choices.

For founders and operators, the practical takeaway is straightforward. If the same or slightly better task results are available at a fraction of the size, efficiency becomes a primary product decision. Device native execution opens up offline and privacy sensitive workflows across phones, laptops, and appliances. For teams that prefer cloud, an API that often charges lower token fees is a reminder that unit economics are moving targets. Multiverse is not claiming to beat the largest frontier models. It is betting that small, high performing models are good enough for a long list of real workloads and that getting them onto everyday hardware will matter.

None of this guarantees that compression alone will solve every edge use case. But Multiverse’s approach puts weight behind a clear direction. Make models smaller without a performance hit, make them run locally, and make access cheaper. If that holds, it will change where and how AI features are built, from IoT sensors to consumer devices to enterprise PCs. The competition to deliver more capability per parameter is now a product strategy, not just a research claim.

Multiverse’s SuperFly and ChickBrain show how compact models can push AI into devices and appliances while reshaping cost and deployment choices. Watch the small end of the model zoo.