Benchmarks can be superficial, but model explanations and evaluations are fundamentally intertwined. What if we used interpretability as principled, scientific evaluation? If it met scientific standards?
arxiv.org/abs/2605.05508
coming to EvalEval at ACL as oral 🧵
1/6
🤗 Super excited to have this work out!
Turns out by calculating the angles 📐 between representations, you can pick out difficult data samples! This can be very useful for assembling hard test sets or more efficient training sets.
See more cool results and visuals in the 🧵