Despite the huge inflow of researchers, much of the work in interpretability remains anecdotal. Our new repro challenge at BlackboxNLP (co-located with EMNLP 2026) aims to attract work challenging common assumptions and showing failure/success cases of popular methods. Negative results welcome!
Gabriele Sarti
📣 Announcing the BlackboxNLP 2026 Reproducibility Challenge!
A new track dedicated to rigorous robustness checks of NLP interpretability work - stress-testing baselines, ablations, generalizability, and evaluation.