Xiaomi MiMo-V2.5 Series API Permanent Price Drop, Up to 99% Discount

Amid the intensifying price war in AI models, Xiaomi's MiMo large model officially announced on May 27th that it would implement a permanent price reduction for its MiMo-V2.5 series API and optimize the billing system at the same time, aiming to further reduce developers' calling costs by leveraging technological dividends.

I. Significant Reduction in API Prices, Up to 99% Off

The price adjustment took effect globally at 0:00 Beijing Time on May 27. This adjustment covers two core versions of MiMo-V2.5 and MiMo-V2.5Pro, and no longer distinguishes between context window lengths, making the pricing strategy more simplified and transparent.

Model Version	Input Cache Hit Price	Maximum Discount	Output Price	Maximum Discount
MiMo-V2.5Pro	0.025 yuan per million tokens	99%	6 yuan per million tokens	86%
MiMo-V2.5	0.02 yuan per million tokens	98%	2 yuan per million tokens	93%

II. Billing System Upgrade: More Value Without Additional Cost

In addition to the direct reduction in API unit prices, Xiaomi has also deeply optimized the Token Plan billing system:

Quadrupled Quota: Under the original pricing standards, the actual token usage quota has been increased to 5 to 8 times the previous amount.
Simplified Rules: Introducing the concept of Credits (points) aims to replace the previous complex billing methods, making it more intuitive and easy for developers to understand token consumption and cost calculation.

III. Technical Foundation: Why Can It Continue to Lower Prices?

Xiaomi's official statement revealed that this significant price cut is based on technical breakthroughs in its underlying inference system architecture:

SWA Inference Optimization: Based on SGLang HiCache fully supporting SWA (Sliding Window Attention Mechanism), the data transfer volume between GPU memory, CPU memory, and SSD has been reduced to 1/7 of the previous level.
Improved Cache Efficiency: The number of cacheable tokens has increased nearly fivefold compared to the optimized version, significantly improving the cache hit rate and greatly reducing the per-inference cost.
Cluster Throughput Optimization: By introducing expert parallel (MoE) solutions and input length bucketing strategies, the cluster's input throughput capacity has seen a qualitative improvement, ensuring high service quality while continuously lowering the service cost per token.

Xiaomi's move is seen as an active response to the current "overcompetition" in large model commercialization. With the further reduction of price barriers, the cost-effectiveness advantage of the MiMo series model will become more prominent, accelerating the deep penetration of AI capabilities into various vertical industries and developer workflows.

Xiaomi MiMo-V2.5 Series API Permanent Price Drop, Up to 99% Discount

I. Significant Reduction in API Prices, Up to 99% Off

II. Billing System Upgrade: More Value Without Additional Cost

III. Technical Foundation: Why Can It Continue to Lower Prices?

Related Recommendations

Powerful Collaboration: SpaceXAI and Cursor Join Forces to Develop an AI Model That May Compete with GPT-5.5

Snap Splits Its Generative AI Video Team to Establish New Company Dotmo, Alleviating High R&D Costs

Artificial Intelligence Sparks Controversy in Military Cooperation: Democratic Lawmakers Urgently Push for Legislation to Restrict Military AI

Anthropic Releases the Latest Claude Model Fable 5/Mythos 5: The Balance Between Safety and Efficiency

Legacy Models to Be Retired! Codex Will Discontinue Several Large Models, GPT-5.5 Intelligence Drop Controversy Still Unresolved