//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts






Loading...
More on this production-centered mechanism across models and + implications for evaluation, interpretation, and pre-training: ๐Ÿ”— instruction-probing.github.io ๐Ÿ“„ arxiv.org/abs/2605.11206 Team effort with @lchoshen.bsky.social @yufanghou.bsky.social @yperlitz.bsky.social๐Ÿ™Œ Questions? Reach out! (3/3)
Excited to present this work together with @dippedrusk.com at #EACL. Join us in the poster session 1 (11:30-13:00) ๐Ÿ”ฅ
27d
2mo
Andreas Waldis
Andreas Waldis
LMs that "know more" about toxicity are less toxic! Our #TACL ๐Ÿ“„ connects behavior and internals: ๐Ÿ’  LMs amplify toxicity beyond humans ๐Ÿ’  Information about toxicity peaks in lower layers ๐Ÿ’  Bypassing these layers increases toxicity More details๐Ÿ‘‡ #NLProc #interpretability (1/๐Ÿงต)
4mo
Andreas Waldis
Thanks a lot to everyone for the support, guidance, mentoring, collaboration, and great moments over the past years! ๐Ÿ™ Without you, this journey wouldn't have been such a pleasure โ€” and now excited to see what the future brings! ๐Ÿš€
3mo
Do instructions affect how LMs process and produce language? โ˜๏ธNot the way you think! ๐Ÿ˜ฒLMs barely change task information when processing a task sample. Instead, instructions shape how this information is accessed and expressed when producing output tokens. #interpretability #nlproc (1/๐Ÿงต)
27d
Video
Andreas Waldis
In short, instructions act less on what models process, and more on what they emit. Behavior changes from prompting, including prompt instability and in-context learning, therefore seem to arise mainly at the production stage, with little adaptation during task-sample processing. (2/๐Ÿงต)
It was a pleasure to ๐Ÿธ
Andreas Waldis
27d
12d