Today, we are launching the Semble API 🥳 🚀
Here's how to use it 👇️
skipped atproto meetup yesterday to stay home and play with new @semble.so API. #Irony #SorryNotSorry
New preprint!
We introduce a new benchmark, SciConBench, with 9.11k scientific questions derived from Cochrane Systematic Reviews.
We find evidence that frontier AI agents **cannot** synthesize scientific conclusions well.
A thread 🧵
w/ @hayoungjung.bsky.social & others!