SuperCLUE has officially released the "2025 Annual Chinese Large Model Benchmark Evaluation Report." This "all-star game" featuring 23 top domestic and international models once again reveals new trends in the global AI competition. The evaluation covered six core dimensions, including mathematical reasoning, code generation, and scientific reasoning, intuitively demonstrating the actual "combat power" of major models in the Chinese language context.

image.png

From the overall ranking, overseas closed-source models still show strong dominance. Claude-Opus-4.5-Reasoning from Anthropic achieved the highest score of 68.25 and ranked first, followed by Google's Gemini-3-Pro-Preview and OpenAI's GPT-5.2 (high), securing second and third places respectively. These three giants form the "first tier," maintaining a slight advantage in logical rigor and comprehensive understanding.

However, the performance of domestic large models is truly surprising, narrowing the gap at an unprecedented speed. The leading open-source model Kimi-K2.5-Thinking and the representative closed-source model Qwen3-Max-Thinking have both entered the global top ten, ranking fourth and sixth respectively. It is encouraging to note that in specialized fields, domestic models have already achieved "partial breakthroughs": Kimi won the global first place in code generation, while Qwen3 tied with Google in mathematical reasoning as world champion.

Looking at the overall situation, there are starkly different competitive dynamics between domestic and overseas markets. In the closed-source field, overseas models are currently leading, with domestic models catching up; however, in the open-source field, domestic models have taken an absolute leading position, with the top five domestic open-source models significantly outperforming their overseas counterparts. This "co-development of open and closed source" scenario indicates that the Chinese AI ecosystem is entering a period of high-quality growth.

Key points:

  • 🏆 Overseas Giants Lead: Claude-Opus-4.5-Reasoning ranks first globally in Chinese large model combat power with the highest score, and overseas closed-source models still occupy the top three positions.

  • 🚀 Domestic Models Achieve Partial Breakthroughs: Kimi-K2.5-Thinking wins first place in code generation, while Qwen3-Max-Thinking ties with Google in mathematical reasoning as world champion.

  • 📊 Domestic Models Dominate in Open Source: In the open-source model group, domestic models perform far better than overseas competitors, showcasing the unique advantages of the domestic large model ecosystem in open collaboration.