Inlay

Wanna the outstanding performance of MASt3R while using a ViT-B or ViT-S encoder instead of its ViT-L one? Don't miss how we build DUNE, a single encoder for diverse 2D & 3D tasks, at this afternoon #CVPR2025 poster session (poster #376). paper: arxiv.org/abs/2503.14405 code: github.com/naver/dune

1/ 📄 Paper 1: "DUNE: Distilling a UNiversal Encoder from Heterogeneous 2D and 3D Teachers" We propose DUNE: a ViT-based encoder distilled from multiple specialized 2D & 3D foundation models to unify visual tasks across 2D, 3D and human understanding.