Introducing Nvidia’s Cosmos World Foundation Models: A Leap Towards Physics-Aware AI
At the forefront of innovation, Nvidia has unveiled its Cosmos World Foundation Models (WFMs) at CES 2025 in Las Vegas. Inspired by the way humans naturally develop mental models of the world, these AI models are set to revolutionize how we generate and predict videos with a strong understanding of physics. Available through Nvidia’s API, NGC catalogs, GitHub, and the AI development platform Hugging Face, these models are designed to be accessible and versatile for a range of applications.
- Nano: Tailored for low latency and real-time applications.
- Super: Offers highly performant baseline models.
- Ultra: Provides maximum quality and fidelity outputs.
“Nvidia is making available the first wave of Cosmos WFMs for physics-based simulation and synthetic data generation,” the company stated in their blog post.
{TechCrunch}
The Cosmos WFMs come in varying sizes—from 4 billion to 14 billion parameters. These parameters are crucial as they correlate with the model’s ability to solve complex problems; generally, more parameters lead to better performance. Notably, Nvidia also introduces an upsampling model aimed at augmented reality, alongside guardrail models that ensure responsible use. These innovations support applications such as generating sensor data crucial for autonomous vehicle development.
Training these models involved a staggering 9,000 trillion tokens from 20 million hours of diverse real-world interactions encompassing environments like industry, robotics, and driving. While Nvidia remains tight-lipped about the data sources, some reports suggest that copyrighted YouTube videos might have been used without explicit permission. In response, Nvidia assures that Cosmos is designed ethically and legally within fair use norms.
Cosmos WFMs can generate synthetic data from text or video frames—ideal for training robotics and autonomous vehicles. With companies like Waabi, Wayve, Fortellix, and Uber already testing these models for various uses, the potential impact is significant. As Uber CEO Dara Khosrowshahi noted, “Generative AI will power the future of mobility,” highlighting the collaboration with Nvidia as a pivotal step towards advancing autonomous driving technology.
Despite being labeled ‘open’, these models aren’t open source in the traditional sense. Nvidia hasn’t disclosed comprehensive training data details or provided all necessary tools to fully recreate Cosmos WFM from scratch. This nuance is why Nvidia refers to them as ‘open’ rather than ‘open source’.
“We really hope [Cosmos will] do for the world of robotics and industrial AI what Llama … has done for enterprise,” stated Jensen Huang, Nvidia’s CEO.
{Press Event}