//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...





Loading...
I'll be presenting this in person at NAACL, tomorrow at 11am in Ballroom C! Come on by - I'd love to chat with folks about this and all things interp / cog sci!
Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work? We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!
(NAACL) When reading a sentence, humans predict what's likely to come next. When the ending is unexpected, this leads to garden-path effects: e.g., "The child bought an ice cream smiled." Do LLMs show similar mechanisms? @michaelwhanna.bsky.social and I investigate: arxiv.org/abs/2412.05353
🚨New arXiv preprint!🚨 LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯 We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov
🚨Call for Papers🚨 The Re-Align Workshop is coming back to #ICLR2025 Our CfP is up! Come share your representational alignment work at our interdisciplinary workshop at @iclr-conf.bsky.social Deadline is 11:59 pm AOE on Feb 3rd representational-alignment.github.io
Apr 30, 2025
Apr 23, 2025
Excited to say that this was accepted to NAACL—looking forward to presenting it in Albuquerque!
Mar 11, 2025
Feb 19, 2025
Jan 16, 2025
Jan 24, 2025
Autoregressive transformer language models (LMs) possess strong syntactic abilities, often successfully handling phenomena from agreement to NPI licensing. However, the features they use to incrementa...
arxiv.org
Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models
Michael Hanna
Aaron Mueller
Aaron Mueller
Adi Simhi
Dota Tianai Dong
Michael Hanna
Sentences are partially understood before they're fully read. How do LMs incrementally interpret their inputs? In a new paper, @amuuueller.bsky.social and I use mech interp tools to study how LMs process structurally ambiguous sentences. We show LMs rely on both syntactic & spurious features! 1/10
Sentences are partially understood before they're fully read. How do LMs incrementally interpret their inputs? In a new paper, @amuuueller.bsky.social and I use mech interp tools to study how LMs process structurally ambiguous sentences. We show LMs rely on both syntactic & spurious features! 1/10
Dec 19, 2024
Dec 19, 2024
Michael Hanna
Michael Hanna