Why do we "vibe-test" and ignore leaderboards? We ran a survey to find out.
Our findings:
❌ 86% said they’ve used a model that "felt" significantly better (or worse) than its reported scores.
✅ 82% of you are "vibe-testing" models through direct interaction.