👉 Learns information-seeking strategies that generalise to OOD
(6/8)
Despite being trained solely on 20 Questions, the agent skills transfer to new OOD tasks, such as customer service and user personalisation 👥
This suggests a shift in how we train agents:
Instead of external critics or verifiers,
👉 Let agents learn by tracking their own uncertainty reduction.
A step toward agents that reason about what they don’t know.
(7/8)
✳️ Benefits of ∆Belief-RL.
✔️ turn-level credit assignment
✔️ O(N) information per trajectory
✔️ learning even from failed episodes
All while keeping training compute-efficient.
(3/8)
How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀
🚨 Enter ∆Belief-RL: We show how to use agent’s own belief updates as a dense reward for turn-level credit assignment.
The result? Surprisingly strong generalization.
(1/8) 🧵⬇️
Result, an agent that solves open ended tasks: CIA
🕵️♀️ Curious Information-seeking Agent 🕵️♂️
👉 CIA beats deepseek v3.2 on our evaluations
(4/8)
💡 Key idea:
👉 Use the change in the agent’s belief about the correct answer as a dense intrinsic reward.
If an action increases: log p(target | history) → reward it.
We call this ∆Belief-RL.
No critic. No process reward model. Just the agent judging its own progress.
(2/8)
👉 Continues to seek information beyond the training horizon
The results suggest that ∆Belief rewards generalize to longer horizons; they teach general information-seeking strategies that continue to resolve uncertainty as more evidence becomes available. 🔎
(5/8)
Blogpost: bethgelab.github.io/delta-belief...
Paper: alphaxiv.org/abs/intrinsi...
Code: github.com/bethgelab/de...
A massive thanks to my collaborators Ilze Amanda Auzina, Sergio Hernández Gutiérrez, Shashwat Goel, @bayesiankitten.bsky.social and Matthias Bethge
(8/8)
@bethgelab.bsky.social
Joschka Strüber @Tuebingen AI Center🇩🇪
🚨 New paper alert! 🚨
We’ve just launched openretina, an open-source framework for collaborative retina modeling across datasets and species.
A 🧵👇 (1/9)
Joschka Strüber @Tuebingen AI Center🇩🇪
Joschka Strüber @Tuebingen AI Center🇩🇪
Joschka Strüber @Tuebingen AI Center🇩🇪
🚀 We're hiring! The @ellisinsttue.bsky.social leads the AI development for Germany’s new open-source nationwide Adaptive Intelligent System learning platform for schools (as part of a consortium led by Assecor & KI macht Schule, and mandated by the FWU).
👉 Apply now: forms.gle/XmLkwEDD45fY...