On April 16, Ant Lingbo Technology officially announced the open-source of the streaming 3D reconstruction model LingBot-Map. This model breaks through the previous limitations by realizing real-time estimation of camera pose and reconstruction of scene 3D structure using only a single regular RGB camera during video capture. This advancement provides efficient, stable, and continuous online mapping capabilities for application scenarios such as robot navigation, autonomous driving, and AR hardware that require real-time spatial perception.

In terms of technology, LingBot-Map adopts a streaming processing architecture, overcoming the limitations of traditional methods that required complete sequence acquisition before unified processing, achieving real-time interaction while receiving and outputting positioning and structure. In international mainstream evaluations, this model has shown excellent performance: on the highly challenging Oxford Spires dataset, its trajectory error is only one-third of the previous best streaming method, even surpassing some offline processing algorithms. Performance metrics show that LingBot-Map supports real-time inference of about 20FPS and maintains almost no degradation in accuracy when running long videos with tens of thousands of frames, balancing high precision, speed, and long-term stability.
The release of LingBot-Map marks another important step for Ant Lingbo following a series of achievements such as depth estimation (Depth), large language action models (VLA), and world models (World). By completing the core link of real-time spatial understanding, Ant Lingbo further strengthens the integrity of its embodied intelligence "foundation". The open-sourcing of this model not only lowers the hardware threshold for high-precision 3D perception but also accelerates the perception and decision-making evolution of embodied intelligent devices in complex dynamic environments.
