I truly believe the rapid advances in the mech interp subfield have something real to offer AI ethics researchers: A chance to look beyond the HOW of evals to the WHY, a first pass at a technical solution when we see the opportunity, a new avenue for showing failures that prove models are not gods
It’s pretty cool that they named this platform after a Wilco album
🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University!
Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇
See the website for more info: nemiconf.github.io/summer26/
Registration: forms.gle/qUNq84pB6AyU...
Submission: forms.gle/PEfMyL4J3PL9...
Micah Benson
One of the most common features of AI delusional spirals in our recent study is a belief that the AI is sentient or has a personality. This played a central role in the delusional narratives, and correlated with increased used. Regulators and AI developers should curb this! arxiv.org/abs/2603.16567
Micah Benson
There's a lot of external pressure on AI ethics to produce solutions instead of critique. As someone who's worked a lot on CSAM, NCII, mental health, and creative harms of AI, if AI developers would have only listened to critiques, we could have avoided all these harms in the first place.