Article

DeepSeek V4 Chinese Large Model Evaluation: Achieving the Glory of Domestic First Again!

Published in Latest AI News

Time :Apr 28, 2026

Read :4minute

In the latest DeepSeek V4 Chinese large model evaluation, the results from the SuperCLUE team show that DeepSeek-V4-Pro has reclaimed the top position in the country with its outstanding comprehensive performance, while the Flash version follows closely in second place. This achievement marks another breakthrough in the technology of domestic open-source models.

This evaluation covered six dimensions: mathematical reasoning, scientific reasoning, code generation, agent task planning, instruction following, and hallucination control. DeepSeek-V4-Pro scored a high 70.98 points, leading the pack, while the Flash version achieved an excellent score of 68.82 points. Both versions significantly outperformed other similar domestic models.

DeepSeek V4 series adopts a new attention mechanism, supports long context of millions of characters, and effectively reduces computing power and memory consumption. This makes the series significantly improve overall efficiency when paired with domestic chips. Compared to the previous V3.2 version, the Pro version has improved by more than 20 points in agent capabilities, nearly 10 points in mathematical reasoning, almost 12 points in instruction following, and made notable improvements in hallucination control.

Regarding the Flash version, it has achieved significant improvements in agent capabilities and mathematical reasoning while maintaining efficient reasoning, showing outstanding cost-effectiveness. The Pro version focuses on high performance and is suitable for complex tasks and professional scenarios, priced at 15 yuan per million Tokens. The Flash version, known for its speed and low cost, has an API price of only 1.25 yuan per million Tokens, making it ideal for daily use.

Although DeepSeek V4 performs well in multiple areas, the evaluation also pointed out that there is still a gap between this model and overseas top models in areas such as code generation and complex instruction execution. Overall, DeepSeek V4, with its balanced capabilities and reasonable cost, has firmly established itself in the domestic market, becoming an excellent choice for daily office work, development creation, and long-text processing.

Key Points:

🌟 DeepSeek-V4-Pro ranks first in the country in the latest evaluation, with the Flash version following closely.

🧠 The evaluation covers six dimensions including mathematical reasoning and scientific reasoning, with the Pro version scoring 70.98 points.

💰 The Pro and Flash versions each have their own characteristics; the former is suitable for complex tasks, while the latter provides high cost-effectiveness and is convenient for daily use.

Related Recommendations

Anthropic Releases the Powerful Large Model Claude Sonnet 5: Performance Approaches the Flagship, Price Drops Significantly

Anthropic launches its new mid-to-high-end model, Claude Sonnet 5, focusing on cost-effectiveness. Its performance has significantly approached the flagship Opus series. The model features the strongest agent capabilities to date, enabling it to independently plan complex tasks, self-check outputs, and flexibly utilize external tools such as browsers and terminals. It performs exceptionally well in reasoning, programming, and knowledge tasks.

Jul 1, 2026

170.5k

U.S. Department of Commerce Lifts Export Controls: Anthropic Announces Claude Fable5 to Return on July 1st

The U.S. Department of Commerce lifted export controls on the Anthropic Claude Fable5 and Mythos5 models on June 30th, allowing full access starting July 1st. Previously, due to a vulnerability in Fable5 that could be exploited to provide information about network attacks, the department had temporarily restricted access to the model on June 12th, causing a three-week downtime. The ban also affected Anthropic's foreign employees.

Jul 1, 2026

157.4k

The Giant in the Computing Field Has Arrived: Meituan Opensources the Trillion-Parameter Model LongCat-2.0

Meituan released and open-sourced its trillion-parameter model LongCat-2.0, trained and inferred on a 50,000-card domestic computing cluster. This verified domestic hardware/software reliability, showcased domestic computing potential, promoted tech exchange and application, and explored new paradigms for ultralarge model construction.....

Jun 30, 2026

221.1k

Baidu Open-sources 3B Model Unlimited OCR: Star Count Exceeds 10,000 in 5 Days, Setting a New Record for Long Document Parsing

Baidu open-sources a 3B-parameter end-to-end OCR model called Unlimited OCR, specifically designed for long documents such as books and papers. The project exceeded 10,000 GitHub stars within 5 days and topped four trending lists. Technically, the model activates approximately 570M parameters, and it innovatively introduces the Reference Sliding Window Attention mechanism, breaking the limitation of page-by-page stitching, supporting continuous parsing of dozens of pages at once, and significantly improving the efficiency of processing long documents.

Jun 29, 2026

181.8k

Key Breakthrough in Computing Power Enhancement: Peking University and DeepSeek Jointly Open-Source Large Model Inference Framework DSpark

Peking University and DeepSeek jointly open-source the DSpark large model inference acceleration framework, providing a breakthrough solution to high concurrency latency and computing power waste caused by the need for full computing power for each token during autoregressive generation.

Jun 29, 2026

173.4k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご