[1/n] Just wrapped up 7 months interning with @pcastr.bsky.social at Google DeepMind and I'm so excited to share our work: arxiv.org/abs/2602.10324.
TLDR: We used LLM-powered program synthesis to automatically model and discover differences between human and LLM strategic behavior
[4/n] Frontier models (Gemini 2.5 Pro/Flash, GPT 5.1) win more and adapt much faster than humans, while smaller models like GPT OSS 120B actually get worse over time because they can’t integrate the long context.
[8/n] For me, it’s really cool that this aligns with the jump in theory-of-mind capabilities in recent LLMs (since opponent modeling in IRPS is basically a type of ToM)
[3/n] So how do their strategic behaviors actually differ from humans? We examined this question through the lens of behavioral game theory, using iterated rock-paper-scissors (IRPS).
[2/n] LLM agents are everywhere now: customer service, negotiations, even as human proxies for social science/market research
[5/n] But what does the difference in win rates actually mean? To understand, we used AlphaEvolve to automatically discover interpretable behavioral models directly from gameplay data.
[6/n] Using this approach, we get actual programs that explain the behavior, which we can read and compare. Diagram for human program shown below.
[7/n] So what were the insights? Both humans and LLMs use value learning + opponent modeling, but frontier models maintain more sophisticated opponent models (3x3x3 transition matrices vs simple size 3 vectors tracking of prior move frequencies).