π¨Go check out our most recent preprint on multilayer attention probing for Vision Transformers! Itβs almost as performant as full fine-tuning while being similarly compute efficient as standard linear probing! Plus you get interpretable attention maps! More in π§΅ππΌ
Preprint: arxiv.org/abs/2601.09322
With the rise of large-scale foundation models, efficiently adapting them to downstream tasks remains a central challenge. Linear probing, which freezes the backbone and trains a lightweight head, is ...