//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...








[9/n] Huge thanks to @pcastr.bsky.social, Daniel Kasenberg, @neurokim.bsky.social and everyone in Montreal 💙 I had an amazing experience exploring topics in behavioral game theory, cognitive science, and AI-driven scientific discovery, together with brilliant colleagues.
[4/n] Frontier models (Gemini 2.5 Pro/Flash, GPT 5.1) win more and adapt much faster than humans, while smaller models like GPT OSS 120B actually get worse over time because they can’t integrate the long context.
[7/n] So what were the insights? Both humans and LLMs use value learning + opponent modeling, but frontier models maintain more sophisticated opponent models (3x3x3 transition matrices vs simple size 3 vectors tracking of prior move frequencies).
[5/n] But what does the difference in win rates actually mean? To understand, we used AlphaEvolve to automatically discover interpretable behavioral models directly from gameplay data.
[6/n] Using this approach, we get actual programs that explain the behavior, which we can read and compare. Diagram for human program shown below.
[3/n] So how do their strategic behaviors actually differ from humans? We examined this question through the lens of behavioral game theory, using iterated rock-paper-scissors (IRPS).
[8/n] For me, it’s really cool that this aligns with the jump in theory-of-mind capabilities in recent LLMs (since opponent modeling in IRPS is basically a type of ToM)
[2/n] LLM agents are everywhere now: customer service, negotiations, even as human proxies for social science/market research
[1/n] Just wrapped up 7 months interning with @pcastr.bsky.social at Google DeepMind and I'm so excited to share our work: arxiv.org/abs/2602.10324. TLDR: We used LLM-powered program synthesis to automatically model and discover differences between human and LLM strategic behavior
3mo
3mo
3mo
3mo
3mo
3mo
3mo
3mo
3mo
Caroline Wang
Caroline Wang
Caroline Wang
Caroline Wang
Caroline Wang
Caroline Wang
Caroline Wang
Caroline Wang
Caroline Wang