//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...
Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS
Aaron Mueller






Loading...
Representation steering is now a common way to mitigate LLM shortcuts. How much legitimate knowledge does this tend to remove? Turns out that these methods can be surprisingly precise! But also: no single steering operation will fix all shortcuts. Led by @shanzzyy.bsky.social!
New book! I have written a book, called Syntax: A cognitive approach, published by MIT Press. This is open access; MIT Press will post a link soon, but until then, the book is available on my website: tedlab.mit.edu/tedlab_websi...
The New England Mechanistic Interpretability (NEMI) workshop is coming to BU on Aug. 14! Join us for talks, a panel, food, and plenty of opportunities to connect with the many great researchers in the area. Register and help spread the word!
4mo
✨ it's coming ✨ NEMI 2026 will be lit. It will also be the new BU interp supergroup's debut ball. Come meet us!
5mo
I truly believe the rapid advances in the mech interp subfield have something real to offer AI ethics researchers: A chance to look beyond the HOW of evals to the WHY, a first pass at a technical solution when we see the opportunity, a new avenue for showing failures that prove models are not gods
1d
I also want to mention that the lang x computation research community at BU is growing in an exciting direction, especially with new faculty like @amuuueller.bsky.social, @anthonyyacovone.bsky.social, @nsaphra.bsky.social, & @profsophie.bsky.social! Also, Boston is quite nice :)
Interpretability provides a toolset for understanding how and why LMs behave in certain ways. This survey proposes a perspective on interpretability research grounded in causal mediation analysis: doi.org/10.1162/COLI... #NLProc #CLJournal @jannikbrinkmann.bsky.social @amuuueller.bsky.social