//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
While effective for chess♟️, Elo ratings struggle with LLM evaluation due to volatility and transitivity issues. New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
10mo
Cohere Labs