Inlay

Profile

Indexing Multimodal Language Models for Large-scale Image Retrieval - CVPR 2026 Findings paper A multimodal LLM (like Qwen) can estimate image-to-image similarity remarkably well without any task-specific training. 📍 Main conference — Poster #12, ExHall A 📅 Sun 7/6, 7:30–9:00

This year, financial awards will be given to the best papers, and student support grants will be available for eligible participants. More information soon - stay tuned!

🇧🇷 Presenting our ICLR 2026 paper “Efficient Probing” (EP) today! ❓What if linear probing is asking the wrong question? 🥳 EP is a lightweight attention probing method that better evaluates local, patch-level representations from models like MIM. 📍Friday 24 April, P4-#3713, 15:15–17:45

🚨 Efficient Local Visual Similarity (ELViS) @ #ICLR 2026 🇧🇷 ELViS is a fast, lightweight, and interpretable module for estimating image-to-image similarity that generalizes well to many image domains. Paper: arxiv.org/abs/2603.28603 Code: github.com/pavelsuma/ELViS Come see poster today @ P4-#3715

Our AI for Peace workshop will take place tomorrow, 26th of April, 9:00 AM - 5:00 PM in room 206. #ICLR2026 @iclr-conf.bsky.social If you are worried about the military uses of AI, come to support us and get informed! 🗓️ 26th of April, 9:00 AM - 5:00 PM 📍206

ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains @psuma.bsky.social @gkordo.bsky.social @skamalas.bsky.social @gtolias.bsky.social tl;dr: SuperGlue, but single score on light-weight top descriptors. Most important thing is dustbin. arxiv.org/abs/2603.28603

10d

20d

1mo

Topics of interest include: • instance-level classification, detection, segmentation • particular object/event retrieval • personalized image/video generation • cross-modal & multimodal recognition • image matching, geolocation, animal re-ID • ILR+G applications, datasets, & benchmarks and more.