At the recent SIGGRAPH International Conference on Computer Graphics and Interactive Techniques, NVIDIA demonstrated a series of new technologies for robot developers, with the most notable being their open-source physical AI model called Cosmos Reason. With 7 billion parameters, this model aims to provide robots with more efficient visual reasoning capabilities.
NVIDIA pointed out that since OpenAI introduced the CLIP model, visual language models have made significant progress in the field of computer vision, especially in tasks such as object recognition and pattern recognition. However, traditional models often struggle with complex multi-step tasks, particularly when dealing with ambiguous or novel real-world situations. Cosmos Reason, with its excellent memory and understanding capabilities, enables robots to reason like humans, thus making more reasonable action decisions in the real world.
In the actual application scenarios demonstrated by NVIDIA, a robotic arm running this visual reasoning model successfully identified the combination of "bread + toaster" and deduced that the next logical step was to put the bread into the toaster for baking. This process is called "robot planning and reasoning," demonstrating the efficiency and flexibility of Cosmos Reason in handling complex instructions.
Beyond serving as a "reasoning brain" for robots, Cosmos Reason can be widely applied in other AI fields. For example, it can automate the processing of large-scale and diverse training datasets, organizing and annotating them. Additionally, it can extract important information from large video data and perform effective analysis. Currently, the model has been commercialized, and NVIDIA's internal robotics and autonomous driving teams are using it for data organization and annotation tasks.
Notably, Uber is also using Cosmos Reason to provide annotations and generate instructions for its autonomous driving training data. Magna International, on the other hand, is using the model to develop fully automated instant delivery solutions, aiming to help vehicles adapt more quickly to new urban environments. Additionally, companies such as VAST Data and Milestone Systems are applying this technology in areas such as traffic monitoring and visual detection.
Aside from Cosmos Reason, NVIDIA has also added Cosmos Transfer-2 to the Cosmos world model, aiming to accelerate the generation of synthetic data for 3D simulation scenarios. At the same time, NVIDIA updated the Omniverse software development kit and launched a new neural reconstruction library, further expanding the tool choices for developers.
Key points:
1. 🤖 The Cosmos Reason model introduced by NVIDIA allows robots to perform efficient visual reasoning and complex decision-making.
2. 🚗 The model has been applied in multiple commercial fields, including Uber's autonomous driving data annotation and Magna International's delivery solutions.
3. 🛠️ NVIDIA has also updated development tools, promoting the integration of robotics technology and AI.