According to the latest evaluation report released by SuperCLUE-VLM in April 2026, there have been structural changes in the field of Chinese multi-modal vision-language models. In a comprehensive assessment of 17 mainstream large models globally, the domestic AI camp showed strong momentum, not only demonstrating significant advantages in understanding the Chinese context but also achieving a breakthrough in overall scores over top international models.

ByteDance topped the list, with several domestic models entering the first tier

The evaluation results show that ByteDance's Doubao-Seed-2.0-Pro-260215 achieved a high score of 90.66, securing the top position on the overall ranking. This achievement surpassed the previously highly anticipated Google Gemini-3.1-Pro-Preview (89.35 points). At the same time, domestic models such as Alibaba's Qwen3.5 series, SenseNova from SenseTime, and GLM from Zhipu also performed well, firmly securing positions in the top ranks. In contrast, overseas well-known models like OpenAI's GPT-5.4 and X.AI's Grok ranked only in the middle during this Chinese multi-modal test.

Chinese Vision Language Models Replaced: Doubao Wins Overall Ranking, Domestic Models Surpass Overseas Ones

Deep Analysis of Three Dimensions, Mature Basic Cognitive Ability

This evaluation system was rigorous, covering three core dimensions: basic cognition, visual reasoning, and visual application. The specific tasks included 25 scenarios such as general recognition, chart analysis, and medical imaging. Domestic models performed particularly well in the "basic cognition" and "data analysis" areas, with scores generally exceeding 90 points, showing a high level of technical maturity and adaptation to the Chinese environment.

Challenges Remain in Vertical Fields, Industrial and Medical Reasoning Become Future Key Points

Although achieving leadership in the overall ranking, the evaluation data also revealed areas where domestic models still need improvement. In highly specialized "visual reasoning" tasks such as industrial inspection and high-precision medical imaging, domestic models still have room for improvement compared to global top levels, with some specific scenarios showing significant score fluctuations.

Industry analysts believe that the recent ranking changes mark a key technological turning point for Chinese multi-modal AI. Domestic large models have established a solid competitive advantage in deep understanding and application capabilities in Chinese scenarios, officially entering a new phase of competing with international giants and even achieving local leadership.