//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
What a damning abstract
Apr 30, 2025
Max Ilse
Some interesting detective work alleging that proprietary LLM developers are gaming the Chatbot Arena leaderboards, with collusion from Chatbot Arena's operators.
Apr 30, 2025
Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion. Chatbot Arena has emerged ...
arxiv.org
The Leaderboard Illusion
Mark J. Nelson