I've been talking about interference weights as a challenge for mechanistic interpretability for a while.
A short note discussing them - transformer-circuits.pub/2025/interfe...
Our new note demonstrates that interference weights in toy models can demonstrate strikingly similar phenomenology to that of Towards Monosemanticity...
But more importantly, I hope it will just help clarify what we mean by interference weights!
Political violence is bad. It usually begets more political violence.
Celebrating political violence is bad. It usually encourages more political violence, against various targets.
Campus shootings are bad. They make everyone on campus less safe.
It's bad that what I wrote here is controversial.