Introducing OLMo-2-0325-32B-Instruct! It's the spring RL curve time. This time, we used GRPO for RLVR and trained a pretty nice fully open source model!
🥘 Excited to share our latest OLMo 1 B models! Almost summer RL time. We did another two-stage RL:
* The first RLVR run uses allenai/RLVR-GSM-MATH-IF-Mixed-Constraints
* The final RLVR run uses allenai/RLVR-MATH for targeted MATH improvement
Short 🧵