🧵Feeling safe against data poisoning in post-training? Think again!
Individual components of LLM post-training pipelines are surprisingly robust to data poisoning attacks.
In work led by Jack Sanderson (co-advised w Yiwei Lu), we show they crumble when attacked together. 1/n