Inlay

If you find yourself asking "how does this model checkpoint differ from the last, and where did it improve/regress?", that's what olmo-eval is for. We're releasing it openly so the community can build on it. 💻 Code: buff.ly/veAANKX 📝 Blog: buff.ly/64B7dPh

Contribute to allenai/olmo-eval development by creating an account on GitHub.