//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
We introduce a simple baseline called NoSense, an image-only (SigLIP) model that discards almost all temporal structure. Surprisingly, it reaches 95% accuracy on VSI-Super-Recall (VSR), even on 4-hour videos. This suggests VSR can be solved without true spatial supersensing.