Chinese AI Vision Models Surge Ahead, Doubao Surpasses Google to Rank First Globally

According to the latest evaluation report released by SuperCLUE-VLM in April 2026, there have been structural changes in the field of Chinese multi-modal vision-language models. In a comprehensive assessment of 17 mainstream large models globally, the domestic AI camp showed strong momentum, not only demonstrating significant advantages in understanding the Chinese context but also achieving a breakthrough in overall scores over top international models.

ByteDance topped the list, with several domestic models entering the first tier

The evaluation results show that ByteDance's Doubao-Seed-2.0-Pro-260215 achieved a high score of 90.66, securing the top position on the overall ranking. This achievement surpassed the previously highly anticipated Google Gemini-3.1-Pro-Preview (89.35 points). At the same time, domestic models such as Alibaba's Qwen3.5 series, SenseNova from SenseTime, and GLM from Zhipu also performed well, firmly securing positions in the top ranks. In contrast, overseas well-known models like OpenAI's GPT-5.4 and X.AI's Grok ranked only in the middle during this Chinese multi-modal test.

Chinese Vision Language Models Replaced: Doubao Wins Overall Ranking, Domestic Models Surpass Overseas Ones

Deep Analysis of Three Dimensions, Mature Basic Cognitive Ability

This evaluation system was rigorous, covering three core dimensions: basic cognition, visual reasoning, and visual application. The specific tasks included 25 scenarios such as general recognition, chart analysis, and medical imaging. Domestic models performed particularly well in the "basic cognition" and "data analysis" areas, with scores generally exceeding 90 points, showing a high level of technical maturity and adaptation to the Chinese environment.

Challenges Remain in Vertical Fields, Industrial and Medical Reasoning Become Future Key Points

Although achieving leadership in the overall ranking, the evaluation data also revealed areas where domestic models still need improvement. In highly specialized "visual reasoning" tasks such as industrial inspection and high-precision medical imaging, domestic models still have room for improvement compared to global top levels, with some specific scenarios showing significant score fluctuations.

Industry analysts believe that the recent ranking changes mark a key technological turning point for Chinese multi-modal AI. Domestic large models have established a solid competitive advantage in deep understanding and application capabilities in Chinese scenarios, officially entering a new phase of competing with international giants and even achieving local leadership.

AI Circle Twist: Brazilian Upstart Model Rio 3.5 Exposed as a Shell of Domestic Large Model

Rio3.5397B, an open-source model launched by an IT company under the Rio de Janeiro city government, faces originality concerns. Technical analysis by the Nex-AGI team indicates that approximately 60% of its core code and logical architecture show signs of 'stitching,' sparking public controversy.....

Brazilian AI underdog Rio 3.5 crashes: accused of cloning and grafting two domestic large models

Rio de Janeiro's municipal government launched the open-source large model Rio 3.5 397B, claiming state-of-the-art performance in multiple benchmarks. Within 24 hours, the Nex-AGI Alliance mathematically analyzed it as a stitched-together model with exposed core weights, sparking industry controversy.....

New Developments in the Chinese Visual Model Competition: Doubao Takes the Lead, Domestic Strength Fully Surpasses!

SuperCLUE-VLM released the latest Chinese multimodal visual language model evaluation results. ByteDance's Doubao-Seed-2.0-Pro-260215 topped the overall list with a score of 90.66, surpassing Google's Gemini-3.1-Pro-Preview at 89.35. The evaluation covered 17 domestic and international models, with Chinese models performing excellently. Alibaba's Qwen3.5 series and SenseTime ranked among the top, highlighting significant breakthroughs in China's ....

Alibaba TONGYI Qwen Open Source Qwen3.5 Small Model Series: Multimodal Agent Can Run on Edge Devices

The Alibaba TONGYI Qwen team has launched the Qwen3.5 small model series, including four lightweight models of 0.8B, 2B, 4B, and 9B, along with their corresponding base versions. They are based on a unified architecture, equipped with native multimodal capabilities (supporting image-text processing), with structural improvements and reinforcement learning training that can be scaled, achieving higher intelligence levels with fewer computing resources. Among them, the 0.8B and 2B models are extremely compact and fast in inference, specifically optimized for edge devices.

Chinese AI Vision Models Surge Ahead, Doubao Surpasses Google to Rank First Globally

Related Recommendations

AI Circle Twist: Brazilian Upstart Model Rio 3.5 Exposed as a Shell of Domestic Large Model

Brazilian AI underdog Rio 3.5 crashes: accused of cloning and grafting two domestic large models

New Developments in the Chinese Visual Model Competition: Doubao Takes the Lead, Domestic Strength Fully Surpasses!

Qwen's Key Figure Leaves? Alibaba Tongyi Qianwen Technical Director Lin Junyang Announces Resignation

Alibaba TONGYI Qwen Open Source Qwen3.5 Small Model Series: Multimodal Agent Can Run on Edge Devices