//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Model preferences are a two-way street between the model’s capability and the user’s perspective. By bridging the gap between benchmarks and real-world vibe-testing, we can evaluate AI the way humans actually use it. arxiv.org/abs/2604.14137 technion-cs-nlp.github.io/vibe-testin...
A paper on vibe-testing and personalized LLM evaluation, showing that personalization can change which model users prefer.
technion-cs-nlp.github.io
From Feelings to Metrics: Understanding and Formalizing How Users VIBE-TEST LLMs
1mo
Itay Itzhak @ COLM 🍁