With the large influx of submissions and a faster pace of research, reproducibility is more important than ever.
With this reproducibility challenge, we want to put the focus on best practices wrt. baselines🧱, ablations🌈, eval🔎 and generalizability🗺️ of interpretability!
Martin Tutek
📣 Announcing the BlackboxNLP 2026 Reproducibility Challenge!
A new track dedicated to rigorous robustness checks of NLP interpretability work - stress-testing baselines, ablations, generalizability, and evaluation.