//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
olmo-eval builds on our OLMES project, which made benchmark scores comparable & reproducible by standardizing how models are evaluated. But a final score is only part of the story—olmo-eval works across the intermediate experiments teams compare throughout model development.
1d
Ai2