Inlay

Profile

Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS

Aaron Mueller

Representation steering is now a common way to mitigate LLM shortcuts. How much legitimate knowledge does this tend to remove? Turns out that these methods can be surprisingly precise! But also: no single steering operation will fix all shortcuts. Led by @shanzzyy.bsky.social!

New book! I have written a book, called Syntax: A cognitive approach, published by MIT Press. This is open access; MIT Press will post a link soon, but until then, the book is available on my website: tedlab.mit.edu/tedlab_websi...

The New England Mechanistic Interpretability (NEMI) workshop is coming to BU on Aug. 14! Join us for talks, a panel, food, and plenty of opportunities to connect with the many great researchers in the area. Register and help spread the word!

4mo

✨ it's coming ✨ NEMI 2026 will be lit. It will also be the new BU interp supergroup's debut ball. Come meet us!

5mo

I truly believe the rapid advances in the mech interp subfield have something real to offer AI ethics researchers: A chance to look beyond the HOW of evals to the WHY, a first pass at a technical solution when we see the opportunity, a new avenue for showing failures that prove models are not gods

I also want to mention that the lang x computation research community at BU is growing in an exciting direction, especially with new faculty like @amuuueller.bsky.social, @anthonyyacovone.bsky.social, @nsaphra.bsky.social, & @profsophie.bsky.social! Also, Boston is quite nice :)

Interpretability provides a toolset for understanding how and why LMs behave in certain ways. This survey proposes a perspective on interpretability research grounded in causal mediation analysis: doi.org/10.1162/COLI... #NLProc #CLJournal @jannikbrinkmann.bsky.social @amuuueller.bsky.social