Inlay

10/ This work was co-first-authored with Jerome Han, together with @benpry.bsky.social , @satchelgrant.bsky.social , @noahdgoodman.bsky.social , and @judithfan.bsky.social . arXiv: arxiv.org/abs/2605.28742 Code: github.com/LinasNas/cor... Website: linasnas.github.io/core-reasoni...

Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so ty...