At this critical juncture where embodied intelligence technology is moving from laboratories to the real world, how robots can accurately understand instructions and perform tasks autonomously in complex environments has become a focal point for the industry. On June 16th, Alibaba officially launched the Qwen-Robot series of embodied intelligence large models, providing various robots with a "general foundation" that can understand natural language, perceive three-dimensional environments, and grasp physical laws.

The Qwen-Robot series includes three core models, which can either independently execute tasks or work collaboratively, forming the first complete embodied intelligence matrix within the Qwen family.

image.png

First is Qwen-RobotManip, responsible for core operations. To address the problem of traditional models experiencing significant performance drops when switching robot platforms, this model adopts a unified action representation and has been pre-trained on over 38,000 hours of open-source data. In third-party authoritative evaluations, its different versions not only took the top two positions in task success rates but also demonstrated breakthrough capabilities in complex, high-difficulty tasks ranging from simply turning on a tap to performing double-arm French fries flipping.

Next is Qwen-RobotNav, which grants robots the ability to "navigate" and "run errands." This model unifies five navigation functions—task instruction understanding, target search, and autonomous driving—within the same framework. Its innovative "task-adaptive observation mechanism" enables robots to completely break free from rigid memorization strategies, allowing them to "walk, look, and plan" flexibly and efficiently complete object-finding tasks in complex unknown spaces.

Finally, there is the Qwen-RobotWorld model, which enhances the depth of a robot's "thinking." This is a physical world model that, like an athlete rehearsing movements, can simulate the next moment's physical state and actions. This not only effectively overcomes the bottleneck of insufficient training data but also allows robots to perform trajectory simulations before executing actions, ensuring absolute accuracy in physical operations.