//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...
Isabelle Lee @ ICML
ml/nlp phding @ usc, currently visiting harvard; training & interpretability & reasoning iglee.me









Loading...
We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples? 🧵NEW PAPER by @jenniferlumeng.bsky.social
really excited to head home for icml:) and attending the co-located FAR.ai alignment workshop (for the first time)! would love to meet others interested in training & interpretability
also, blog: iglee.me/papers/inte... 7/6
work w/ Emmy Liu, Cathy Jiao @brihi.bsky.social, Dani Yogatama, Fazl Barez, @saxon.me since i'm headed home for icml, presented by amazing @brihi.bsky.social! this was my first time writing a position paper, which turned into a grant, which i'm turning into multiple projects 🙂 stay tuned 6/6
3. Predicting failures. A distinction: scientific prediction (not the ML kind) is how scientists validate our understanding. A hypothesis proves its strength w/ predictive power. Used as eval, interp can predict failures from internals. Meaning, we generate eval from interp. 5/6