Inlay

We’re excited about Oolong as a challenging benchmark for information aggregation! Let us know which models we should benchmark next 👀 Paper: arxiv.org/abs/2511.02817 Dataset: huggingface.co/oolongbench Code: github.com/abertsch72/o... Leaderboard: oolongbench.github.io

As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently...