Inlay

🧵1/ 🚨 New paper: A Sober Look at Progress in Language Model Reasoning We re-evaluate recent SFT and RL models for mathematical reasoning and find most gains vanish under rigorous, multi-seed, standardized evaluation. 📊 bethgelab.github.io/sober-reason... 📄 arxiv.org/abs/2504.07086