Indexing Multimodal Language Models for Large-scale Image Retrieval - CVPR 2026 Findings paper
A multimodal LLM (like Qwen) can estimate image-to-image similarity remarkably well without any task-specific training.
📍 Main conference — Poster #12, ExHall A
📅 Sun 7/6, 7:30–9:00
This year, financial awards will be given to the best papers, and student support grants will be available for eligible participants.
More information soon - stay tuned!
🇧🇷 Presenting our ICLR 2026 paper “Efficient Probing” (EP) today!
❓What if linear probing is asking the wrong question?
🥳 EP is a lightweight attention probing method that better evaluates local, patch-level representations from models like MIM.
📍Friday 24 April, P4-#3713, 15:15–17:45
🚨 Efficient Local Visual Similarity (ELViS) @ #ICLR 2026 🇧🇷
ELViS is a fast, lightweight, and interpretable module for estimating image-to-image similarity that generalizes well to many image domains.
Paper: arxiv.org/abs/2603.28603
Code: github.com/pavelsuma/ELViS
Come see poster today @ P4-#3715
Our AI for Peace workshop will take place tomorrow, 26th of April, 9:00 AM - 5:00 PM in room 206. #ICLR2026 @iclr-conf.bsky.social
If you are worried about the military uses of AI, come to support us and get informed!
🗓️ 26th of April, 9:00 AM - 5:00 PM
📍206
ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
@psuma.bsky.social @gkordo.bsky.social @skamalas.bsky.social @gtolias.bsky.social
tl;dr: SuperGlue, but single score on light-weight top descriptors. Most important thing is dustbin.
arxiv.org/abs/2603.28603