The global advancement of large model technology towards "embodied intelligence" and advanced agents is accelerating rapidly. On June 2, Alibaba announced through the official channel of Qwen Large Model that it has officially launched a new multimodal agent model - Qwen3.7-Plus. This not only represents another technological breakthrough for the Tongyi Qianwen series in the multimodal field, but also marks the iteration of the core foundation for domestic large models in edge-side and complex workflow applications.
As the core highlight of this upgrade, Qwen3.7-Plus builds on the powerful native text processing capabilities of Qwen3.7 and undergoes a comprehensive and advanced evolution in vision-language (Vision-Language) capabilities. This means the model can not only better "understand" complex image and video content, but also transform this refined visual perception into deep logical reasoning, greatly expanding the practical application boundaries of multimodal interaction.
In addition to the transformation of visual capabilities, the model still maintains its top-tier hard-core strengths in the core chain of agents (Agent). In areas such as programming code generation, complex tool use (Tool-use), and high-level productivity workflows (Productivity Workflows), Qwen3.7-Plus demonstrates high task continuity and decision robustness, allowing it to adapt more smoothly to enterprise-level automation tasks and long-term intelligent scheduling scenarios.
Industry analysts point out that the competition in the second half of the large model era has clearly shifted toward multimodal and agent-based solutions. By deeply integrating visual understanding with agent action planning, Alibaba's Qwen3.7-Plus not only further raises the performance ceiling of open-source and commercial models, but also provides a more imaginative computing foundation for subsequent broader industrial intelligence and embodied robot applications.
