Inlay

//

Profile

Loading...

RL + LLM @ai2.bsky.social; main dev of https://cleanrl.dev/

Costa Huang

Loading...

Congrats on the launch!

This is all. Enjoy the new model 😆

One fun thing is that our model outperformed qwen by almost ~26 points in IFEval. What's going on? We built some nice visualization tools, finding out that basically our model can follow the instructions like "write without a comma" well.

Our 1B model achieves impressive performance. See our official tweet for more details! bsky.app/profile/ai2....

The model checkpoints are available in huggingface.co/collections/.... As always, we uploaded all the intermediate RL checkpoints

🥘 Excited to share our latest OLMo 1 B models! Almost summer RL time. We did another two-stage RL: * The first RLVR run uses allenai/RLVR-GSM-MATH-IF-Mixed-Constraints * The final RLVR run uses allenai/RLVR-MATH for targeted MATH improvement Short 🧵

We streamlined our release process to include the RLVR intermediate checkpoints as well. They are available in the revisions if you want to check it out. See our updated collection here: huggingface.co/collections/...