Some interesting detective work alleging that proprietary LLM developers are gaming the Chatbot Arena leaderboards, with collusion from Chatbot Arena's operators.
Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion. Chatbot Arena has emerged ...