Inlay

The object and spatial understanding priors of DINOv2 features enable robust scene understanding, essential for navigation and manipulation tasks. With this prior, DINO-WM outperforms state-of-the-art world models by 45% in downstream task performance on our hardest tasks.

Can we extend the power of world models beyond just online model-based learning? Absolutely! We believe the true potential of world models lies in enabling agents to reason at test time. Introducing DINO-WM: World Models on Pre-trained Visual Features for Zero-shot Planning.

Jan 31, 2025

Overall, DINO-WM takes a step toward bridging the gap between task-agnostic world modeling and reasoning and control, offering promising prospects for generic world models in real-world applications.

Video