//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Noce paper on how well benchmarks cover what people do at work arxiv.org/abs/2603.01203 Quote "these observations suggest that agent benchmarking effort is driven less by alignment with real-world employment structure or economic value, and more by methodological convenience."
2mo
AI agents are increasingly developed and evaluated on benchmarks relevant to human work, yet it remains unclear how representative these benchmarking efforts are of the labor market as a whole. In thi...
arxiv.org
How Well Does Agent Development Reflect Real-World Work?