Inlay

Excited to present this work together with @dippedrusk.com at #EACL. Join us in the poster session 1 (11:30-13:00) 🔥

LMs that "know more" about toxicity are less toxic! Our #TACL 📄 connects behavior and internals: 💠 LMs amplify toxicity beyond humans 💠 Information about toxicity peaks in lower layers 💠 Bypassing these layers increases toxicity More details👇 #NLProc #interpretability (1/🧵)