What aspects of human knowledge do vision models like CLIP fail to capture, and how can we improve them? We suggest models miss key global organization; aligning them makes them more robust. Check out LukasMuttenthaler's work, finally out (in Nature!?) www.nature.com/articles/s41... + our blog! 1/3
Aligning foundation models with human judgments enables them to more accurately approximate human behaviour and uncertainty across various levels of visual abstraction, while additionally improving th...