Recently, Ant Group's Ant Lingbo Technology officially announced the full open-source release of its embodied intelligence large model - LingBot-VLA and related post-training code. This move not only marks a major breakthrough in the field of robotics but also verifies the model's cross-physical migration capabilities on different types of robots, further promoting the development of intelligent robots.
LingBot-VLA has already been successfully adapted with multiple robot manufacturers, including Xinghai Tu, Songling, and Leju. Through a post-training toolchain developed by Ant Lingbo Technology, the model can efficiently train at a speed of 261 samples per second with an 8-GPU configuration. This training efficiency is 1.5 to 2.8 times that of mainstream frameworks such as StarVLA and OpenPI, effectively reducing the cost of data and computing power.

Based on a vast amount of real-world data, Ant Lingbo has conducted a systematic study of the performance of VLA models in real robot tasks, discovering that as the pre-training data increases, the model's success rate in downstream tasks continues to improve. From training with 3,000 hours of data to eventually reaching 20,000, the model's success rate keeps rising, demonstrating a positive relationship between data volume and model performance.
More excitingly, LingBot-VLA achieved an average success rate of 15.7% across different real robot platforms in the GM-100 embodied evaluation benchmark developed by Shanghai Jiao Tong University, up from 13.0%. After introducing depth information, the success rate further increased to 17.3%.
In addition, Ant Lingbo Technology launched the LingBot-Depth spatial perception model on January 27. This model focuses on depth completion in real-world scenarios, using a stereo 3D camera to collect and verify RGB-Depth data. LingBot-Depth can convert incomplete depth sensor data affected by noise into high-quality 3D measurement results, significantly enhancing environmental depth perception and 3D understanding capabilities.
In multiple benchmark tests, LingBot-Depth has shown excellent performance in tasks such as depth completion and monocular depth estimation, demonstrating industry-leading precision and stability. The successful certification of this model provides more accurate 3D visual capabilities for intelligent terminals such as robots and autonomous vehicles.
