Inlay

LMs that "know more" about toxicity are less toxic! Our #TACL 📄 connects behavior and internals: 💠 LMs amplify toxicity beyond humans 💠 Information about toxicity peaks in lower layers 💠 Bypassing these layers increases toxicity More details👇 #NLProc #interpretability (1/🧵)

Current LLMs are exactly 0 useful for tweaking style in PDFs generated with LaTeX from quarto documents. It's niche, admittedly, but a great area showing severe limits close to complete breakdown of capabilities. Not possible to entertain illusion of "understanding" here. Full hallucination mode.