Artificial intelligence startup Liquid AI officially released and open-sourced a new edge-side large model LFM2.5-8B-A1B today. Designed for tool calls and complex instruction following on consumer-grade hardware, the model significantly improves the reasoning and inference performance of edge devices while maintaining extremely low computational costs.

In terms of architecture, the model adopts a sparse mixture-of-experts (MoE) design with a total parameter count of 8.3B. Thanks to this sparsity, the model only activates 1.5B parameters per token generation, allowing it to run smoothly on local devices such as smartphones and laptops.

image.png

Extended Long Text and Enhanced Reasoning Capabilities

Compared to its predecessor, LFM2.5 has expanded the context window from 32K to 128K tokens, and the pre-training data volume has increased from 12T to 38T. As a pure inference model, it generates an explicit reasoning chain before outputting the final answer, and its highly compressed vocabulary efficiently handles nine languages including Chinese and Arabic.

To address issues such as logical dead loops and hallucinations in long reasoning, the development team introduced two-stage reinforcement learning (RL) during training. Preference optimization effectively reduces "dead loops" in long-chain reasoning, while a specialized anti-hallucination reward mechanism allows the model to actively refuse to answer questions beyond its knowledge base.

Powerful Edge Performance and Full Ecosystem Compatibility

In terms of performance, LFM2.5 has seen explosive growth, with scores in logical reasoning and anti-hallucination benchmark tests far surpassing its predecessor, even rivaling larger models in instruction following. In terms of tool calling, the model defaults to outputting efficient Python function calls and supports seamless switching to JSON format within system prompts.

The model received full support from mainstream inference ecosystems on its release day, including llama.cpp, MLX, vLLM, and SGLang. In hardware testing, its decoding speed reached up to 253 bytes per second on the M5 Max chip, and about 30 bytes per second on mobile devices, perfectly balancing privacy and high efficiency for edge-side operations.