San Diego, California, Nov. — At the NeurIPS 2025 event, NVIDIA launched its first inference vision-language-action model for Level 4 autonomous driving, Alpamayo-R1, and simultaneously released it on GitHub and Hugging Face. The new model is based on the Cosmos-Reason series launched in August this year, and can process camera, LiDAR, and text instructions in one go. It performs internal reasoning before outputting driving decisions, and the official said it injects "common sense" into vehicles.

Key features of Alpamayo-R1:
- Unified architecture: end-to-end training of visual, language, and action modalities to avoid error accumulation from separate modules
- Reasoning pipeline: Cosmos reasoning chain enables the model to perform multi-step inference for scenarios such as "braking of the car in front" and "pedestrians crossing," then output acceleration/braking/steering signals
- Ready to use: weights, reasoning scripts, and evaluation tools are all packaged in the "Cosmos Cookbook," allowing developers to fine-tune as needed
NVIDIA Chief Scientist Bill Dally stated that robotics and autonomous driving will be the core of the next wave of AI, "We want to be the brain of all machines." Along with the release of the new model, the company also introduced a full-process guide for data synthesis, model evaluation, and post-training, encouraging automakers and Robotaxi teams to quickly verify Level 4 functions in limited areas.
Analysts believe that open-source inference models can significantly lower the R&D threshold for car manufacturers, but whether Alpamayo-R1 can pass functional safety certification and meet automotive-grade real-time requirements remains a crucial barrier before commercialization.
