//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts






Loading...
🚀 We're hiring! The @ellisinsttue.bsky.social leads the AI development for Germany’s new open-source nationwide Adaptive Intelligent System learning platform for schools (as part of a consortium led by Assecor & KI macht Schule, and mandated by the FWU). 👉 Apply now: forms.gle/XmLkwEDD45fY...
This work was a great collaboration; special shout-out to @jana-z.bsky.social for leading this project and submitting the first paper of her PhD!
How useful are self-generated 'mental images' (visual aids) in MLLM/UMM reasoning? Turns out: currently not very. Visualizations have small errors that compound in multi-step problems, and models often ignore correct visual aids in their decision making.
The fact that we don't see strong benefits of using even ground-truth visuals points to information in the visual/textual domains being somewhat misaligned, potentially because models are not trained for similar tasks.
This work is motivated by the same intuition as my work on Video models last fall: Can media generation capabilities be useful beyond just generating nice visuals? For real-world, embodied applications being able to visualize the outcome of an action seems useful.
Whether self-generated visuals can at some point serve a function similar to mental imagery in human thought remains to be seen. For now, MentisOculi provides a small suite of tasks to study this topic.