Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS
Aaron Mueller
Loading...
Representation steering is now a common way to mitigate LLM shortcuts. How much legitimate knowledge does this tend to remove? Turns out that these methods can be surprisingly precise! But also: no single steering operation will fix all shortcuts.
Led by @shanzzyy.bsky.social!
New book! I have written a book, called Syntax: A cognitive approach, published by MIT Press.
This is open access; MIT Press will post a link soon, but until then, the book is available on my website:
tedlab.mit.edu/tedlab_websi...
The New England Mechanistic Interpretability (NEMI) workshop is coming to BU on Aug. 14!
Join us for talks, a panel, food, and plenty of opportunities to connect with the many great researchers in the area.
Register and help spread the word!
✨ it's coming ✨
NEMI 2026 will be lit. It will also be the new BU interp supergroup's debut ball. Come meet us!
I truly believe the rapid advances in the mech interp subfield have something real to offer AI ethics researchers: A chance to look beyond the HOW of evals to the WHY, a first pass at a technical solution when we see the opportunity, a new avenue for showing failures that prove models are not gods
I also want to mention that the lang x computation research community at BU is growing in an exciting direction, especially with new faculty like @amuuueller.bsky.social, @anthonyyacovone.bsky.social, @nsaphra.bsky.social, &
@profsophie.bsky.social! Also, Boston is quite nice :)
Interpretability provides a toolset for understanding how and why LMs behave in certain ways. This survey proposes a perspective on interpretability research grounded in causal mediation analysis: doi.org/10.1162/COLI... #NLProc #CLJournal @jannikbrinkmann.bsky.social @amuuueller.bsky.social