Amid the intensifying price war in AI models, Xiaomi's MiMo large model officially announced on May 27th that it would implement a permanent price reduction for its MiMo-V2.5 series API and optimize the billing system at the same time, aiming to further reduce developers' calling costs by leveraging technological dividends.

I. Significant Reduction in API Prices, Up to 99% Off
The price adjustment took effect globally at 0:00 Beijing Time on May 27. This adjustment covers two core versions of MiMo-V2.5 and MiMo-V2.5Pro, and no longer distinguishes between context window lengths, making the pricing strategy more simplified and transparent.
| Model Version | Input Cache Hit Price | Maximum Discount | Output Price | Maximum Discount |
| MiMo-V2.5Pro | 0.025 yuan per million tokens | 99% | 6 yuan per million tokens | 86% |
| MiMo-V2.5 | 0.02 yuan per million tokens | 98% | 2 yuan per million tokens | 93% |
II. Billing System Upgrade: More Value Without Additional Cost
In addition to the direct reduction in API unit prices, Xiaomi has also deeply optimized the Token Plan billing system:
Quadrupled Quota: Under the original pricing standards, the actual token usage quota has been increased to 5 to 8 times the previous amount.
Simplified Rules: Introducing the concept of Credits (points) aims to replace the previous complex billing methods, making it more intuitive and easy for developers to understand token consumption and cost calculation.

III. Technical Foundation: Why Can It Continue to Lower Prices?
Xiaomi's official statement revealed that this significant price cut is based on technical breakthroughs in its underlying inference system architecture:
SWA Inference Optimization: Based on SGLang HiCache fully supporting SWA (Sliding Window Attention Mechanism), the data transfer volume between GPU memory, CPU memory, and SSD has been reduced to 1/7 of the previous level.
Improved Cache Efficiency: The number of cacheable tokens has increased nearly fivefold compared to the optimized version, significantly improving the cache hit rate and greatly reducing the per-inference cost.
Cluster Throughput Optimization: By introducing expert parallel (MoE) solutions and input length bucketing strategies, the cluster's input throughput capacity has seen a qualitative improvement, ensuring high service quality while continuously lowering the service cost per token.
Xiaomi's move is seen as an active response to the current "overcompetition" in large model commercialization. With the further reduction of price barriers, the cost-effectiveness advantage of the MiMo series model will become more prominent, accelerating the deep penetration of AI capabilities into various vertical industries and developer workflows.
