Indexing Multimodal Language Models for Large-scale Image Retrieval - CVPR 2026 Findings paper
A multimodal LLM (like Qwen) can estimate image-to-image similarity remarkably well without any task-specific training.
📍 Main conference — Poster #12, ExHall A
📅 Sun 7/6, 7:30–9:00