//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
This is fascinating!
4d
New preprint! We introduce a new benchmark, SciConBench, with 9.11k scientific questions derived from Cochrane Systematic Reviews. We find evidence that frontier AI agents **cannot** synthesize scientific conclusions well. A thread 🧡 w/ @hayoungjung.bsky.social & others!
Ondrej Mottl πŸŒΏπŸ’»πŸ“ˆπŸŒβ³
5d
Manoel Horta Ribeiro