Inlay

Noce paper on how well benchmarks cover what people do at work arxiv.org/abs/2603.01203 Quote "these observations suggest that agent benchmarking effort is driven less by alignment with real-world employment structure or economic value, and more by methodological convenience."

AI agents are increasingly developed and evaluated on benchmarks relevant to human work, yet it remains unclear how representative these benchmarking efforts are of the labor market as a whole. In thi...