//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Excited to see our #COLM2025 paper on fluid benchmarking highlighted by @eval-eval.bsky.social! They are worth a follow if you are into LLM eval research. 🔬
7mo
Valentin Hofmann
✨ Weekly AI Evaluation Paper Spotlight ✨ 🤔Is it time to move beyond static tests and toward more dynamic, adaptive, and model-aware evaluation? 🖇️ "Fluid Language Model Benchmarking" by @valentinhofmann.bsky.social et. al introduces a dynamic benchmarking method for evaluating language models
7mo
EvalEval Coalition