What does Mentis Oculi test?
A collection of visual reasoning tasks (e.g. Rush Hour, Sliding Puzzle) designed to probe whether models can mentally transform visual states across multiple steps.
Each puzzle is specified by a single image, but solving it requires a visual rollout.