10/ This work was co-first-authored with Jerome Han, together with @benpry.bsky.social , @satchelgrant.bsky.social , @noahdgoodman.bsky.social , and @judithfan.bsky.social .
arXiv: arxiv.org/abs/2605.28742
Code: github.com/LinasNas/cor...
Website: linasnas.github.io/core-reasoni...
Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so ty...