Inlay

Interpretability provides a toolset for understanding how and why LMs behave in certain ways. This survey proposes a perspective on interpretability research grounded in causal mediation analysis: doi.org/10.1162/COLI... #NLProc #CLJournal @jannikbrinkmann.bsky.social @amuuueller.bsky.social