How useful are self-generated 'mental images' (visual aids) in MLLM/UMM reasoning?
Turns out: currently not very. Visualizations have small errors that compound in multi-step problems, and models often ignore correct visual aids in their decision making.
Thaddäus Wiedemer
Can AI reason by “imagining” — not just by seeing or reading?
We introduce Mentis Oculi, a benchmark for machine mental imagery: multi-step visual puzzles that require maintaining and updating visual states over time.
📄 arxiv.org/abs/2602.02465
🌐 jana-z.github.io/mentis-oculi/
🧵⬇️