//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Introducing bluffbench, a new tool to evaluate how well LLMs actually see data plots. When we trick LLMs with secret #RStats transformations, they can miss the visual contradiction. bluffbench helps us measure this "blind spot" in AI coding agents. Learn more: posit.co/blog/introdu...
6mo
When plotting, LLMs see what they expect to see - Posit
Data science agents need to accurately read plots even when the content contradicts their expectations. Our testing shows today's LLMs still struggle here.
posit.co
Posit