//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
We’re excited about Oolong as a challenging benchmark for information aggregation! Let us know which models we should benchmark next 👀 Paper: arxiv.org/abs/2511.02817 Dataset: huggingface.co/oolongbench Code: github.com/abertsch72/o... Leaderboard: oolongbench.github.io
As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently...
arxiv.org
Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
7mo
Amanda Bertsch