In the wave of rapid development in artificial intelligence, MiniMax M2, as a new pre-trained model, has attracted a lot of attention. Its use of full attention mechanism has sparked widespread discussion, and many technical experts and enthusiasts have wondered: "Why not continue to develop linear or sparse attention technologies?" In response, the pre-training lead of MiniMax M2 has decided to delve into the reasons behind this decision.
First, the development team believes that although linear and sparse attention technologies have the potential to save computing resources in the current industrial environment, it will still take some time to completely replace the full attention mechanism. Large language models (LLMs) face various complex scenarios in practical applications, such as code parsing, mathematical calculations, and multimodal data processing. Evaluating a model's performance requires not only theoretical support but also validation through practical application.
Second, although researchers have been exploring more efficient attention mechanisms, models that perform well often require excellent engineering optimization. The MiniMax M2 team clearly realizes that model effectiveness, speed (TPS), and cost are the three aspects most important to users. To improve model performance, researchers must overcome the shortcomings of evaluation systems and the high cost of observation.
Finally, the MiniMax M2 team also faces challenges in infrastructure. Compared to full attention, the infrastructure for linear and sparse attention is relatively weak, and developers need to put in more effort to achieve performance improvements. With the limitations of computing resources and the growing demand for data processing, the advantages of linear and sparse attention may gradually become apparent. Therefore, the team is preparing for this transition in advance.
The MiniMax M2 team will continue to explore more efficient model architectures and optimize existing infrastructure to meet future computing needs. On the path of continuous progress, the team remains passionate about technological exploration and hopes to launch more competitive products in the near future.
