Xiaomi has recently announced the open-sourcing of the full post-training process for its visual-language-action (VLA) large model, Xiaomi-Robotics-0, in real-world settings. This move marks an important step for Xiaomi in the field of embodied intelligence, aiming to enable robots to quickly master complex operational skills with minimal data.
20 Hours to Master "Needle-in-a-Straw"
Based on a pre-trained foundation, the research team used only about 20 hours of task data for real-robot post-training, enabling the robot to perform the high-difficulty action of accurately placing earphones into a case. This process requires extremely high spatial perception accuracy and must overcome displacement interference caused by very low surface roughness.
The model must align within sub-millimeter tolerances and be able to correct action deviations in real-time. This "smooth and continuous" execution capability demonstrates the outstanding potential of Xiaomi-Robotics-0 in handling high-precision assembly tasks.

Open Source Ecosystem Drives Productivity Evolution
To make this model truly a "ready-to-use" tool, Xiaomi not only opened the model weights but also released the technical report and source code. This end-to-end open-source model greatly reduces the barriers for developers entering the field of embodied intelligence.
Previously, the model performed well on international authoritative platforms, ranking among the top downloads globally. With the release of the post-training process, global developers will be able to jointly optimize the perception and execution logic of robots, accelerating the integration of AI robots into real-life production and daily life.
Project Website: https://robotics.xiaomi.com/xiaomi-robotics-0.html
Open Source Code: https://github.com/XiaomiRobotics/Xiaomi-Robotics-0
