//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts









Loading...
Excited to share that 2/2 papers from our Lab @AreaSciencePark were accepted to #NeurIPS2025 (one spotlight šŸŽ‰) Great work everyone! @alexpietroserra.bsky.social @francescortu.bsky.social @lbasile.bsky.social @lvaleriani.bsky.social @diegodoimo.bsky.social @maiorca.xyz @locatelf.bsky.social
Thanks to the amazing team at LADE @areasciencepark: @lvaleriani.bsky.social @lbasile.bsky.social @AlessioAnsuini @diegodoimo.bsky.social @albecazzaniga.bsky.social šŸ™
šŸŽÆ Key finding: In these models the hidden representations of images and text form disjoint clusters and the communication between modalities is mediated by the special token <end-of-image>!
Nice start of @neuripsconf.bsky.social! Our work with @francescortu.bsky.social and @diegodoimo.bsky.social on the Competition of Mechanisms to understand counterfactuality in LLMs featured in the "Causality for LLMs" workshop :-) Check out our ACL2024 paper aclanthology.org/2024.acl-long.…
Additionally, blocking communication from this token significantly disrupts performance on standard benchmarks, while blocking image-text communication does not
It was super fun to take our first step in interpreting multimodal LLMs, working closely with the brilliant @alexpietroserra.bsky.social and @EmanuelePanizon
Thanks again, @diegodoimo.bsky.social and @albecazzaniga.bsky.social , for the fantastic mentorship and support! šŸ™šŸŽ‰ They are also attending #NeurIPS, so feel free to reach out to them to discuss our results. I’m excited to keep pushing forward on these topics! šŸš€
āœ… This shows that, starting from the mid-layers, a single token effectively summarizes all 1024 image tokens! āŒ This does not occur in models fine-tuned for visual understanding (such as Pixtral).
🌐 Check out our code and data at: ritareasciencepark.github.io/Narrow-gate
🚨 🚨 Excited to share our latest paper, now on #arXiv! šŸ–¼ļø We studied how unified VLMs, trained to generate both text and images (e.g., Meta's Chameleon), exchange information between modalities, comparing them to standard VLMs. šŸ“„ Paper: arxiv.org/abs/2412.06646 Deep dive: šŸ‘‡
Dec 10, 2024
9mo
Dec 10, 2024
Dec 10, 2024
Dec 10, 2024
Dec 10, 2024
Dec 10, 2024
Dec 10, 2024
Dec 10, 2024
Dec 10, 2024