//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Fable 5 is the new highest scorer on bluffbench! The eval measures models' ability to accurately describe plots that show counterintuitive patterns. Six months ago, the strongest models were still in the single digits. simonpcouch.github.io/bluffbench/
11h
Simon P. Couch