Moonshot Launches Kimi Linear Architecture: KV Cache Reduced by 75%, Inference Speed Increased by 6 Times, Attention Mechanism Sees Groundbreaking Innovation!

Recently, Moonshot AI released a revolutionary technology called "Kimi Linear" hybrid linear attention architecture. This architecture is considered to be superior to traditional full-attention methods in various scenarios such as short-range, long-range processing, and reinforcement learning (RL). The core innovation lies in "Kimi Delta Attention" (KDA), which is an optimized upgrade of Gated DeltaNet, introducing a more efficient gating mechanism to improve the utilization efficiency of limited-state RNN (Recurrent Neural Network) memory.

The architecture of Kimi Linear is unique, consisting of three Kimi Delta Attention modules and one global MLA (Multi-Layer Perceptron). Through improvements to Gated DeltaNet, KDA can significantly compress the memory usage of limited-state RNNs through a fine-grained gating mechanism. This design not only increases the speed of model information processing but also effectively reduces memory consumption, offering stronger practicality.

Official data shows that under scenarios involving 1M tokens, the KV cache usage of Kimi Linear is reduced by 75%, and decoding throughput is increased up to six times. In terms of TPOT (training speed), compared to traditional MLA, Kimi Linear achieves a 6.3x acceleration. These significant performance improvements indicate the wide applicability of Kimi Linear in various AI tasks, especially in scenarios where speed and memory requirements are extremely high.

With the rapid development of artificial intelligence, enhancing the processing capability and efficiency of models has become a key challenge in the industry. Moonshot AI's Kimi Linear architecture offers new solutions for this field through its innovative design, and it may become a new industry benchmark in the future.

Detailed information about the Kimi Linear technical report can be obtained through the official GitHub page. Interested readers are encouraged to explore its technical details further.

Technical Report: https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf

Moonshot AI Launches Kimi Linear: 6 Times Faster Linear Attention Architecture, Open-Source KDA Kernel Released Simultaneously

The domestic team Moonshot AI released the technical report on the Kimi Linear architecture, proposing a hybrid linear architecture that can replace the full attention mechanism. This architecture achieves breakthroughs in speed, memory efficiency, and long context processing, significantly reducing the use of KV cache, combining efficiency with performance advantages, and is called the new starting point for attention mechanisms in the era of intelligent agents.

China Academy of Information and Communications Technology's Artificial Intelligence Institute Jointly Released 'Research Report on the Application of Large Model Integrated Machines (2025)'

China Academy of Information and Communications Technology and the Artificial Intelligence Industry Development Alliance released 'Research Report on the Application of Large Model Integrated Machines (2025)', analyzing technical evolution, industry dynamics, and application practices, providing enterprises with comprehensive references. The report outlines the development history of large model integrated machines, highlights significant progress in recent years, and focuses on changes at the technical level.

Moonshot Launches Kimi Linear Architecture: KV Cache Reduced by 75%, Inference Speed Increased by 6 Times, Attention Mechanism Sees Groundbreaking Innovation!

Related Recommendations

Moonshot Introduces a New Hybrid Linear Attention Architecture Kimi Linear

Moonshot AI Launches Kimi Linear: 6 Times Faster Linear Attention Architecture, Open-Source KDA Kernel Released Simultaneously

New Version of Affinity, Under Canva, Has Been Released: Challenging Adobe's Dominance for Free

Amazon Cuts 14,000 Jobs: AI Transformation Accelerates - Are Human Jobs Being Replaced by Robots?

China Academy of Information and Communications Technology's Artificial Intelligence Institute Jointly Released 'Research Report on the Application of Large Model Integrated Machines (2025)'