We use MASt3R's two-view prior as our only network with no fine-tuning.
By leveraging this 3D prior and making minimal assumptions on the camera model, we can handle dynamically changing zoom.
Efficient test-time optimisation and loop closure enable large-scale consistency.