//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts


Loading...
🤗 Super excited to have this work out! Turns out by calculating the angles 📐 between representations, you can pick out difficult data samples! This can be very useful for assembling hard test sets or more efficient training sets. See more cool results and visuals in the 🧵
this is so sick tbh
Benchmarks can be superficial, but model explanations and evaluations are fundamentally intertwined. What if we used interpretability as principled, scientific evaluation? If it met scientific standards? arxiv.org/abs/2605.05508 coming to EvalEval at ACL as oral 🧵 1/6
7h
4h
23h
Ruochen
Isabelle Lee @ ICML
alex williams
We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples? 🧵NEW PAPER by @jenniferlumeng.bsky.social
8h
Our new paper sets the stage for the biggest practical use case of model interpretability: stress testing and dataset development. All you need is interpretable linear features and simple geometry.
8h
Humans cannot always intuit what scenarios are most challenging to LLMs. Hoping to capture challenging edge cases, developers either design problems to be difficult for humans or curate extensive benc...
arxiv.org
Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry
Naomi Saphra
Naomi Saphra