Read the full paper: arxiv.org/abs/2606.06533 or come listen to our oral @icmlconf.bsky.social!
Huge thanks to my co-authors @aflah02101.bsky.social Niloofar @catherinearnett.bsky.social @fbarez.bsky.social @nsaphra.bsky.social
Stay tuned for a related workshop (hopefully) at NeurIPS too!
What would it mean to have a scientific understanding of AI? Models are not static objects: they are snapshots of time-evolving processes shaped by data, objectives, architectures, and optimization dy...
Post hoc analysis can certainly be useful, especially if you’re primarily concerned with the behavior of a specific deployed model. But looking at a static model will not tell you why the model developed a behavior. The real causal story must go back to the training process.
Part of why post hoc analysis dominates: it's the only thing most researchers CAN do. Almost no one releases intermediate checkpoints or training data. we built MultiBERT and Pythia to set a better standard, and it's been great to see work like OLMo and Marin follow our lead.
A test for progress: a science of AI should support progressively stronger forms of understanding.
1. Predict outcomes from early training signals
2. Intervene to correct trajectories on undesirable paths
3. Design training procedures that reliably produce desired properties
In film, "we'll fix it in post" is what you say when something went wrong on set and you don't want to redo it. AI research has made it our entire methodology: train the model, then patch whatever comes out. Our new ICML oral argues this can't be the basis of a science of AI. 🧵