//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts









Loading...
really excited to head home for icml:) and attending the co-located FAR.ai alignment workshop (for the first time)! would love to meet others interested in training & interpretability
also, blog: iglee.me/papers/inte... 7/6
2. Reproducibility detects faulty mechanisms. If we were to actually act on it, we want our claim to identify mechanisms robustly against variations wrt input, method, etc. Our claim needs to be reproducible under specified conditions. 4/6
What does it mean for interp to meet scientific standards? We argue that it has to meet 3 criteria: falsifiability, reproducibility and predictability. 2/6
Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How? We're ready to answer. 🧵