Inlay

Profile

👉 Learns information-seeking strategies that generalise to OOD (6/8) Despite being trained solely on 20 Questions, the agent skills transfer to new OOD tasks, such as customer service and user personalisation 👥

4mo

This suggests a shift in how we train agents: Instead of external critics or verifiers, 👉 Let agents learn by tracking their own uncertainty reduction. A step toward agents that reason about what they don’t know. (7/8)

✳️ Benefits of ∆Belief-RL. ✔️ turn-level credit assignment ✔️ O(N) information per trajectory ✔️ learning even from failed episodes All while keeping training compute-efficient. (3/8)

How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀 🚨 Enter ∆Belief-RL: We show how to use agent’s own belief updates as a dense reward for turn-level credit assignment. The result? Surprisingly strong generalization. (1/8) 🧵⬇️

Result, an agent that solves open ended tasks: CIA 🕵️‍♀️ Curious Information-seeking Agent 🕵️‍♂️ 👉 CIA beats deepseek v3.2 on our evaluations (4/8)

4mo

💡 Key idea: 👉 Use the change in the agent’s belief about the correct answer as a dense intrinsic reward. If an action increases: log p(target | history) → reward it. We call this ∆Belief-RL. No critic. No process reward model. Just the agent judging its own progress. (2/8)

👉 Continues to seek information beyond the training horizon The results suggest that ∆Belief rewards generalize to longer horizons; they teach general information-seeking strategies that continue to resolve uncertainty as more evidence becomes available. 🔎 (5/8)

4mo

Blogpost: bethgelab.github.io/delta-belief... Paper: alphaxiv.org/abs/intrinsi... Code: github.com/bethgelab/de... A massive thanks to my collaborators Ilze Amanda Auzina, Sergio Hernández Gutiérrez, Shashwat Goel, @bayesiankitten.bsky.social and Matthias Bethge (8/8) @bethgelab.bsky.social

Joschka Strüber @Tuebingen AI Center🇩🇪

4mo

🚨 New paper alert! 🚨 We’ve just launched openretina, an open-source framework for collaborative retina modeling across datasets and species. A 🧵👇 (1/9)

Joschka Strüber @Tuebingen AI Center🇩🇪

🚀 We're hiring! The @ellisinsttue.bsky.social leads the AI development for Germany’s new open-source nationwide Adaptive Intelligent System learning platform for schools (as part of a consortium led by Assecor & KI macht Schule, and mandated by the FWU). 👉 Apply now: forms.gle/XmLkwEDD45fY...

Mar 14, 2025

Joschka Strüber @Tuebingen AI Center🇩🇪

6mo

Joschka Strüber @Tuebingen AI Center🇩🇪

Wieland Brendel

Federico D’Agostino