I am humbled to join this excellent team and work on delivering the highest quality human preference LLM evals! ⚔️⚔️⚔️
curiosity
discovery
goofiness
I've been following this project since it first showed up in my google scholar notifications for papers that cite Elo in 2023 and had fun experimenting with their data and contributing open source before it was a company.
My brain is living in my head rent free
EsportsBench refreshed with data up through June 2025, over 61k new matches across 20 esports have been recorded in the last 3 months!
huggingface.co/datasets/Esp...
Extremely excited to announce that I've joined @lmarena.bsky.social !
For years I've been working in LLMs for my job, and hacking on rankings and ratings for fun, beyond thrilled to be able to join this project at the intersection!
Just ran into Simpsons paradox in the wild for the first time lol. Was looking at some data and was like "that doesn't look right all the means went up when all I did was assign groups differently, this is like Simpson's paradox or something lol"