//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
LMs that "know more" about toxicity are less toxic! Our #TACL ๐Ÿ“„ connects behavior and internals: ๐Ÿ’  LMs amplify toxicity beyond humans ๐Ÿ’  Information about toxicity peaks in lower layers ๐Ÿ’  Bypassing these layers increases toxicity More details๐Ÿ‘‡ #NLProc #interpretability (1/๐Ÿงต)
4mo
Andreas Waldis