Robotics technology is at a critical moment of transformation from "industrial automation" to "physical AI." According to IT Home, Microsoft Research has officially launched a new AI model called Rho-alpha. This model aims to break the robot's reliance on closed, pre-set environments, allowing it to perform outstandingly in complex, changing, and unpredictable real-world scenarios.

image.png

As a core achievement of Microsoft's "physical AI" strategy, Rho-alpha demonstrates astonishing interaction capabilities. It can directly understand human natural language commands and convert them into precise control signals, guiding robots to complete extremely complex hand coordination tasks. This means that future robots will no longer need to study obscure code scripts; with just one sentence, they can understand and execute operations like humans.

In terms of perception, Rho-alpha goes even further. It not only inherits the excellent visual and language processing genes of the Phi family but also integrates tactile perception for the first time. When a robot grasps an object, it can adjust the force and posture in real-time based on actual touch feedback. Microsoft revealed that more modalities such as force perception will be added in the future, making the robot's operational accuracy reach a new level.

To make robots smarter and more user-friendly, Rho-alpha introduces an adaptive mechanism for dynamic behavior adjustment. During actual operation, if a robot performs poorly, human operators can intervene and correct it through a 3D input device, and the system will absorb these feedbacks in real-time into the learning process. Through the integration of massive simulation data generated by Azure infrastructure and real-world demonstration data, Rho-alpha is accelerating its evolution, striving to become a truly user-preference-aware intelligent assistant.

Key points:

  • 🗣️ Language Control Directly: The Rho-alpha model achieves direct conversion between natural language and robot control signals, supporting complex hand coordination operations, breaking free from the constraints of traditional preset scripts.

  • 🖐️ Tactile Evolution: This model introduces a tactile feedback mechanism based on vision, enabling robots to adjust their behavior in real-time according to actual touch sensations. In the future, it will add a force perception modality to improve accuracy.

  • 🔄 Continuous Learning: The system supports real-time human intervention and correction, allowing it to learn user preferences through 3D input devices and continuously optimize its performance in unstructured environments by combining simulation and real data.