//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts






Loading...
My brain is living in my head rent free
I am humbled to join this excellent team and work on delivering the highest quality human preference LLM evals! ⚔️⚔️⚔️
7mo
11mo
Grace
Then I spent another hour debugging the data for nans and nulls and corruption until I realized that it actually was Simpson's paradox
Just ran into Simpsons paradox in the wild for the first time lol. Was looking at some data and was like "that doesn't look right all the means went up when all I did was assign groups differently, this is like Simpson's paradox or something lol"
Extremely excited to announce that I've joined @lmarena.bsky.social ! For years I've been working in LLMs for my job, and hacking on rankings and ratings for fun, beyond thrilled to be able to join this project at the intersection!
I've been following this project since it first showed up in my google scholar notifications for papers that cite Elo in 2023 and had fun experimenting with their data and contributing open source before it was a company.
11mo
11mo
11mo
EsportsBench refreshed with data up through June 2025, over 61k new matches across 20 esports have been recorded in the last 3 months! huggingface.co/datasets/Esp...
11mo
11mo
Clayton Thorrez
Clayton Thorrez
Clayton Thorrez
Clayton Thorrez
Clayton Thorrez
Clayton Thorrez