Vision Transformers are brittle to variable input resolutions. In dense prediction tasks, the standard fix is sliding-window inference with heavy overlap, which is effective but painfully slow.
SPAR (Single-Pass Any-Resolution ViT), takes a different approach.
Kevin Kia
The Two Doors
The instance level recognition and generation workshop will be back at @eccv.bsky.social 2026. With excellent keynote speakers, a call for papers, and travel grants for students.
Our work Global-Aware Edge Prioritization for Pose Graph Initialization is a CVPR 2026 oral paper and award candidate.
Oral: Sun, Jun 7, 10:15 โ 11:30 at Bluebird Ballroom (Oral Session 5A: Dynamic Perception)
๐ Poster: Sun, Jun 7, 11:45 AM โ 1:45 PM at ExHall F (Poster Session 5), Poster #2
Indexing Multimodal Language Models for Large-scale Image Retrieval - CVPR 2026 Findings paper
A multimodal LLM (like Qwen) can estimate image-to-image similarity remarkably well without any task-specific training.
๐ Main conference โ Poster #12, ExHall A
๐ Sun 7/6, 7:30โ9:00