//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Benchmarks can be superficial, but model explanations and evaluations are fundamentally intertwined. What if we used interpretability as principled, scientific evaluation? If it met scientific standards? arxiv.org/abs/2605.05508 coming to EvalEval at ACL as oral 🧵 1/6
1d
Isabelle Lee @ ICML