๐งต1/ ๐จ New paper: A Sober Look at Progress in Language Model Reasoning
We re-evaluate recent SFT and RL models for mathematical reasoning and find most gains vanish under rigorous, multi-seed, standardized evaluation.
๐ bethgelab.github.io/sober-reason...
๐ arxiv.org/abs/2504.07086