In the latest DeepSeek V4 Chinese large model evaluation, the results from the SuperCLUE team show that DeepSeek-V4-Pro has reclaimed the top position in the country with its outstanding comprehensive performance, while the Flash version follows closely in second place. This achievement marks another breakthrough in the technology of domestic open-source models.

This evaluation covered six dimensions: mathematical reasoning, scientific reasoning, code generation, agent task planning, instruction following, and hallucination control. DeepSeek-V4-Pro scored a high 70.98 points, leading the pack, while the Flash version achieved an excellent score of 68.82 points. Both versions significantly outperformed other similar domestic models.
DeepSeek V4 series adopts a new attention mechanism, supports long context of millions of characters, and effectively reduces computing power and memory consumption. This makes the series significantly improve overall efficiency when paired with domestic chips. Compared to the previous V3.2 version, the Pro version has improved by more than 20 points in agent capabilities, nearly 10 points in mathematical reasoning, almost 12 points in instruction following, and made notable improvements in hallucination control.

Regarding the Flash version, it has achieved significant improvements in agent capabilities and mathematical reasoning while maintaining efficient reasoning, showing outstanding cost-effectiveness. The Pro version focuses on high performance and is suitable for complex tasks and professional scenarios, priced at 15 yuan per million Tokens. The Flash version, known for its speed and low cost, has an API price of only 1.25 yuan per million Tokens, making it ideal for daily use.
Although DeepSeek V4 performs well in multiple areas, the evaluation also pointed out that there is still a gap between this model and overseas top models in areas such as code generation and complex instruction execution. Overall, DeepSeek V4, with its balanced capabilities and reasonable cost, has firmly established itself in the domestic market, becoming an excellent choice for daily office work, development creation, and long-text processing.
Key Points:
🌟 DeepSeek-V4-Pro ranks first in the country in the latest evaluation, with the Flash version following closely.
🧠 The evaluation covers six dimensions including mathematical reasoning and scientific reasoning, with the Pro version scoring 70.98 points.
💰 The Pro and Flash versions each have their own characteristics; the former is suitable for complex tasks, while the latter provides high cost-effectiveness and is convenient for daily use.
