//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
while steering methods effectively control target behavior, they substantially increase LLMs’ vulnerability to jailbreaks, revealing a failure of robust specificity. If you’re at EACL, stop by my poster at 9AM today to hear more. Here's a link to the full paper: aclanthology.org/2026.eacl-lo...
2mo
Navita Goyal, Hal Daumé Iii. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2026.
aclanthology.org
Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions
Navita Goyal