One fun thing is that our model outperformed qwen by almost ~26 points in IFEval. What's going on? We built some nice visualization tools, finding out that basically our model can follow the instructions like "write without a comma" well.
Our 1B model achieves impressive performance. See our official tweet for more details!
bsky.app/profile/ai2....
The model checkpoints are available in huggingface.co/collections/....
As always, we uploaded all the intermediate RL checkpoints
We streamlined our release process to include the RLVR intermediate checkpoints as well. They are available in the revisions if you want to check it out.
See our updated collection here: huggingface.co/collections/...
Introducing OLMo-2-0325-32B-Instruct! It's the spring RL curve time. This time, we used GRPO for RLVR and trained a pretty nice fully open source model!
๐ฅ Excited to share our latest OLMo 1 B models! Almost summer RL time. We did another two-stage RL:
* The first RLVR run uses allenai/RLVR-GSM-MATH-IF-Mixed-Constraints
* The final RLVR run uses allenai/RLVR-MATH for targeted MATH improvement
Short ๐งต