Inlay

ProfilePosts

We experimented with different backbones, camera pose representations, scalability, and attention mechanisms. Our evaluation spans hundreds of full-length videos across various metrics, without aligning the predicted trajectory to the ground truth, to simulate a real-world application

📽️ Check out Visual Odometry Transformer! VoT is an end-to-end model for getting accurate metric camera poses from monocular videos. vladimiryugay.github.io/vot/

Thanks to the team, Kien Nguyen, Theo Gevers, @cgmsnoek.bsky.social, and @martin-r-oswald.bsky.social from the University of Amsterdam!

⏩ GaME code release! github.com/VladimirYuga... Grab components for your 3D reconstruction pipeline: 🔹 Purely geometric out-of-view scene change detection 🔹 Outdated observations filtering 🔹 Evaluation videos of changing scenes Contributions welcome 🚀

8mo

2mo

VoT does not require calibration or post-optimization and operates in real-time, capable of processing thousands of frames. It is trained on a vast amount of real-world indoor data, but can work just fine in outdoor scenarios. It uses only camera poses as supervision, making it broadly accessible

Video

8mo

Vladimir Yugay