Inlay

Model preferences are a two-way street between the model’s capability and the user’s perspective. By bridging the gap between benchmarks and real-world vibe-testing, we can evaluate AI the way humans actually use it. arxiv.org/abs/2604.14137 technion-cs-nlp.github.io/vibe-testin...

A paper on vibe-testing and personalized LLM evaluation, showing that personalization can change which model users prefer.