Benchmarks can be superficial, but model explanations and evaluations are fundamentally intertwined. What if we used interpretability as principled, scientific evaluation? If it met scientific standards?
arxiv.org/abs/2605.05508
coming to EvalEval at ACL as oral 🧵
1/6