At a critical juncture where the large model competition shifts from "parameter racing" to "efficiency competition," MiniMax released its new open-source reasoning model M2 on October 27th, with precise engineering trade-offs, anchoring itself in the core battlefield of intelligent agents, the next generation of AI applications.
M2 adopts a Mixture-of-Experts (MoE) architecture, with a total parameter count of 230 billion, but only activates 10 billion parameters per inference, achieving an output speed of up to 100 tokens per second — this performance metric gives it a significant advantage in real-time interaction scenarios. More importantly, M2 is specifically designed for intelligent agents, enhancing reasoning continuity and response efficiency in behavioral decision-making, multi-turn task planning, and environmental interaction, providing a fundamental engine for building truly autonomous AI agents.

Notably, compared to its predecessor M1, M2 has made a strategic adjustment in the context window: it has been reduced from 1 million tokens supported by M1 to 204,800 tokens. This change is not a technological regression, but a pragmatic trade-off made by MiniMax between long text processing, reasoning speed, and deployment costs. Although M1 set a record with its "million-context," its high resource consumption limited practical implementation; M2, on the other hand, focuses on frequent, high-response agent tasks, ensuring sufficient context length while significantly improving throughput efficiency and cost-effectiveness.
As an open-source model, M2 further lowers the barrier for developers to build customized agents. Whether it's creating virtual assistants with complex task chains, automated workflow robots, or decision-making agents embedded in enterprise systems, developers can quickly iterate and flexibly optimize based on M2.
MiniMax clearly positions M2 as the "reasoning foundation of the Agent era." In the wave where AI is moving from "question-answering tools" to "acting agents," the release of M2 is not just an upgrade of the model, but also a bet on the next generation of AI application paradigms — when agents need to think quickly, act continuously, and interact efficiently, speed and cost may be more critical than context length.
