//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Ever used a top-ranked LLM that just... felt wrong for you? You’re not alone. Instead of leaderboards, many of us turn to "vibe-testing" - manually comparing models to our own needs. But can we turn these feelings into a structured evaluation? New paper: "From Feelings to Metrics" 🧵
1mo
Itay Itzhak @ COLM 🍁