Alibaba has officially released the new multimodal large model Qwen3.7-Plus. Building on the strong text capabilities of Qwen3.7, this model has comprehensively upgraded its vision-language capabilities and unified them into an integrated intelligent base. As a multimodal interactive hybrid agent, Qwen3.7-Plus can seamlessly integrate GUI (Graphical User Interface) and CLI (Command Line Interface) interactions, achieving end-to-end automation from front-end prototypes to complex software engineering.

QQ20260602-091627.jpg

In the authoritative visual model ranking Vision Arena, Alibaba has successfully entered the top five globally and first in China thanks to the strong performance of Qwen3.7-Plus.

Core Technical Capabilities and Evaluation Performance

The core advantage of Qwen3.7-Plus lies in integrating "seeing, thinking, writing, doing, and verifying" into a unified cycle, demonstrating top-tier performance in three key areas:

  • Text and Reasoning Agent: Strong performance in complex software engineering and scientific programming tasks such as Terminal Bench2.0, SWE-bench, and SciCode; ranks among the top Plus-level models in high-difficulty STEM reasoning benchmarks like GPQA Diamond.

  • Multimodal Reasoning and Visual Programming: Possesses strong spatial modeling and path search capabilities (significantly improved on BabyVision). It also supports converting images, videos, and UI screenshots into executable code with one click (such as SVG re-creation and interactive web design).

  • Real-World Perception and Video Understanding: Covers document parsing, advanced OCR, and understanding of long and short video event streams, and demonstrates precise understanding of dynamic spatial relationships in driving scenarios like LingoQA.

QQ20260602-091701.jpg

Disruptive Real-World Applications

The launch also showcased several cutting-edge intelligent agent systems built based on Qwen3.7-Plus:

  1. Full-Stack Autonomous Development of APPs: In testing, the Hybrid-Agent system ran continuously for over 11 hours, triggered more than 1,000 calls, and autonomously generated over 10,000 lines of code, completing the entire English vocabulary learning app development cycle from requirements documentation to testing and deployment without any human intervention.

  2. High-Fidelity Replication of Desktop Applications: The agent autonomously interacted with the native macOS "Stocks" application, understood its layout, automatically wrote SwiftUI source code, and integrated real-time market data APIs. All 10 functional verification tests passed, perfectly replicating the original dark theme and interaction experience.

  3. Unmanned Operations for Cloud Console: The browser extension "Qwen for Chrome" developed based on Qwen3.7-Plus can understand natural language requests from non-expert users, autonomously enter the Alibaba Cloud console to complete ECS server pricing, selection, configuration, and purchase, and even independently handle complex maintenance upgrades such as shutdown and expansion.

Currently, Qwen3.7-Plus is officially available through Aliyun BaiLian and Qwen Studio. Regardless of whether it is deployed through frameworks such as Claude Code, OpenClaw, or Qwen Code, the model maintains stable cross-framework generalization performance, laying a solid foundation for next-generation embodied scenarios and productivity workflow automation.