Following the open-source release of the high-precision spatial perception model LingBot-Depth yesterday, Ant Group's Lingbo Technology announced today the full open-source release of the embodied large model LingBot-VLA. As an "intelligent foundation" for real robot operation scenarios, LingBot-VLA achieves cross-body and cross-task generalization capabilities, significantly reducing post-training costs, and promoting the engineering implementation of "one brain, multiple machines."
In the GM-100 embodied evaluation benchmark (containing 100 real operation tasks) developed by Shanghai Jiao Tong University, LingBot-VLA achieved an average success rate of 15.7% (w/o Depth) in cross-body generalization on three different real robot platforms, surpassing Pi0.5's 13.0%. With the introduction of depth information (w/ Depth), the spatial perception capability is enhanced, and the average success rate further increases to 17.3%, setting a new record for real machine evaluation, verifying its performance advantages in real scenarios.

(Figure: In the GM-100 real machine evaluation, LingBot-VLA outperforms Pi0.5 in cross-body generalization)
In the RoboTwin2.0 simulation benchmark (containing 50 tasks), facing intense environmental randomization interference (such as lighting, clutter, height disturbances), LingBot-VLA, with its unique learnable query alignment mechanism, highly integrates depth information, achieving an operational success rate that is 9.92% higher than Pi0.5, realizing comprehensive performance leadership from virtual simulation to real-world deployment.

(Figure: In the RoboTwin2.0 simulation evaluation, LingBot-VLA outperforms Pi0.5 in cross-task generalization)
For a long time, due to differences in body types, tasks, and environments, embodied intelligent models face serious generalization challenges when being deployed. Developers often need to repeatedly collect large amounts of data for post-training for different hardware and tasks, directly increasing the cost of deployment and making it difficult for the industry to form a scalable and replicable delivery path.
To address these issues, LingBot-VLA is pre-trained on more than 20,000 hours of large-scale real machine data, covering 9 mainstream dual-arm robot configurations (including AgileX, Galaxea R1Pro, R1Lite, AgiBot G1, etc.), enabling the same "brain" to seamlessly migrate to different robot configurations and maintain usability, success rate, and robustness when tasks and environments change. When combined with the high-precision spatial perception model LingBot-Depth, LingBot-VLA can obtain higher-quality depth information representation. Through the upgrade of "vision," it truly achieves "seeing clearly and doing better."
LingBot-VLA significantly reduces the adaptation threshold for downstream tasks, requiring only 80 demonstration data to achieve high-quality task migration. In addition, with deep optimization of the underlying code library, its training efficiency reaches 1.5 to 2.8 times that of mainstream frameworks such as StarVLA and OpenPI, achieving a dual reduction in data and computing power costs.
This open-source initiative not only provides model weights but also opens up a complete code library that includes data processing, efficient fine-tuning, and automated evaluation. This move greatly shortens the model training cycle, lowers the computing power and time barriers for commercial deployment, helping developers quickly adapt to their own scenarios at a lower cost, significantly improving the practicality of the model.
Zhu Xing, CEO of Ant Lingbo Technology, said, "For embodied intelligence to be widely applied, it relies on efficient embodied base models, which directly determines whether it is usable and affordable. We hope that through the open-source release of LingBot-VLA, we can explore the upper limits of embodied intelligence and promote the development of embodied intelligence into a new stage that is reusable, verifiable, and scalable, accelerating the penetration and popularization of AI in the physical world and serving everyone earlier."
LingBot-VLA is the first embodied intelligence base model open-sourced by Ant, and also another exploratory achievement of Ant in AGI research. Zhu Xing introduced that Ant Group is committed to exploring AGI through an open-source and open model, and has therefore built InclusionAI, constructing a complete technical system and open-source ecosystem covering basic models, multimodal, reasoning, novel architectures, and embodied intelligence. The open-source release of LingBot-VLA is a key practice of InclusionAI. "We look forward to working with global developers to accelerate the iteration and large-scale application of embodied intelligence technology, helping AGI arrive faster."
It is reported that during the data collection phase, LingBot-VLA used the hardware platform of Xinghai Tu and Songling. Leju, Kupasi, the National Local Co-construction Humanoid Robot Innovation Center, Beijing Humanoid Robot Innovation Center Co., Ltd., Boden Intelligence, and Ruierman also provided high-quality data support during the model pre-training phase. Currently, LingBot-VLA has been adapted with Xinghai Tu, Songling, and Leju manufacturers, verifying the model's cross-body migration capability on different robot configurations.
