RL + LLM @ai2.bsky.social; main dev of https://cleanrl.dev/
Costa Huang
Loading...
Congrats on the launch!
This is all. Enjoy the new model ๐
One fun thing is that our model outperformed qwen by almost ~26 points in IFEval. What's going on? We built some nice visualization tools, finding out that basically our model can follow the instructions like "write without a comma" well.
Our 1B model achieves impressive performance. See our official tweet for more details!
bsky.app/profile/ai2....
The model checkpoints are available in huggingface.co/collections/....
As always, we uploaded all the intermediate RL checkpoints
๐ฅ Excited to share our latest OLMo 1 B models! Almost summer RL time. We did another two-stage RL:
* The first RLVR run uses allenai/RLVR-GSM-MATH-IF-Mixed-Constraints
* The final RLVR run uses allenai/RLVR-MATH for targeted MATH improvement
Short ๐งต
We streamlined our release process to include the RLVR intermediate checkpoints as well. They are available in the revisions if you want to check it out.
See our updated collection here: huggingface.co/collections/...