PhD student in ML at Tübingen AI Center & International Max-Planck Research School for Intelligent Systems
Andreas Hochlehnert
Loading...
Cambrian-S is a valuable first step in defining what “supersensing” might mean for video models. Our results simply highlight how subtle benchmark design choices can be exploited — and how we can improve them together.
📄 arxiv.org/abs/2511.16655
🔗 github.com/bethgelab/s...
This indicates that the tailored Cambrian-S inference strategy may rely on benchmark-specific shortcuts (e.g. rooms are never revisited), rather than building a persistent, spatial world model over time.
For VSI-Super-Counting (VSC), we run a sanity check:
🔁 VSC-Repeat: we concatenate each video with itself 1-5×
✅ Unique object count stays the same
❌ Cambrian-S accuracy drops from 42% → 0%
A genuine supersensing system should be robust here.