Inlay

Introducing bluffbench, a new tool to evaluate how well LLMs actually see data plots. When we trick LLMs with secret #RStats transformations, they can miss the visual contradiction. bluffbench helps us measure this "blind spot" in AI coding agents. Learn more: posit.co/blog/introdu...

Data science agents need to accurately read plots even when the content contradicts their expectations. Our testing shows today's LLMs still struggle here.