Inlay

Profile

Do instructions affect how LMs process and produce language? ☝️Not the way you think! 😲LMs barely change task information when processing a task sample. Instead, instructions shape how this information is accessed and expressed when producing output tokens. #interpretability #nlproc (1/🧵)

26d

I already presented some work on reference (names, pronouns, coreference resolution, pronoun fidelity, etc.) as a rich site to evaluate biases and commonsense reasoning, and our work on disentangling model behaviour and internals through aligned probing (led by @tresiwald.bsky.social).

In short, instructions act less on what models process, and more on what they emit. Behavior changes from prompting, including prompt instability and in-context learning, therefore seem to arise mainly at the production stage, with little adaptation during task-sample processing. (2/🧵)

Thanks a lot to everyone for the support, guidance, mentoring, collaboration, and great moments over the past years! 🙏 Without you, this journey wouldn't have been such a pleasure — and now excited to see what the future brings! 🚀

It was a pleasure to 🍸

Excited to present this work together with @dippedrusk.com at #EACL. Join us in the poster session 1 (11:30-13:00) 🔥

More on this production-centered mechanism across models and + implications for evaluation, interpretation, and pre-training: 🔗 instruction-probing.github.io 📄 arxiv.org/abs/2605.11206 Team effort with @lchoshen.bsky.social @yufanghou.bsky.social @yperlitz.bsky.social🙌 Questions? Reach out! (3/3)

2mo

Andreas Waldis