π paulgavrikov.github.io/visualoverload
Joint work with Wei Lin, M. Jehanzeb Mirza, Soumya Jahagirdar, Muhammad Huzaifa, Sivan Doveh, Serena Yeung-Levy, James Glass, Hilde Kuehne.
Do Vision-Language Models (VLMs) actually "see" everything in a crowded room? π
Today at #CVPR2026, we are presenting VisualOverload, our work exploring the critical visual perception bottlenecks of VLMs in dense scenes.
π Today (Poster Session 6), 5:30 PM - 7:30 PM, Poster 431 (ExHall A)
Visit our #CVPR2026 poster #179 at 11:50-12:30 to learn about issues and solutions for negation in CLIP. Work led by Fawaz Sammani and Tzoulio Chamiti.
This is the first time I fully vibecoded a tool, and it was impressive how far I got in the little time I invested. Claude (Antigravity) did not "one-shot" this, but the few bugs I found were smaller details. Give it a try!
github.com/paulgavrikov...
Meet Slurm Manager: a self-hosted web dashboard for Slurm clusters.
Connect via SSH, monitor nodes & jobs in real time, submit scripts, view fairshare quotas β all from your browser. Basically, a handy wrapper over Slurm commands via SSH.
The paper introduces VisualOverload, a new visual question answering (VQA) benchmark designed to test vision-language models (VLMs) on densely populated, detail-rich scenes using public-domain paintin...
Sure, but except for the desk rejects thereβs no feedback to optimize on. You start reviewing (good or bad) and just keep doing what you did. I think it would be great to provide at least some high level feedback or scores.
That was our inspiration :)
Will it be shared with reviewers? I think some kind of feedback would be great, especially for first time reviewers