//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
👉 Learns information-seeking strategies that generalise to OOD (6/8) Despite being trained solely on 20 Questions, the agent skills transfer to new OOD tasks, such as customer service and user personalisation 👥
4mo
This suggests a shift in how we train agents: Instead of external critics or verifiers, 👉 Let agents learn by tracking their own uncertainty reduction. A step toward agents that reason about what they don’t know. (7/8)
✳️ Benefits of ∆Belief-RL. ✔️ turn-level credit assignment ✔️ O(N) information per trajectory ✔️ learning even from failed episodes All while keeping training compute-efficient. (3/8)
How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀 🚨 Enter ∆Belief-RL: We show how to use agent’s own belief updates as a dense reward for turn-level credit assignment. The result? Surprisingly strong generalization. (1/8) 🧵⬇️
Result, an agent that solves open ended tasks: CIA 🕵️‍♀️ Curious Information-seeking Agent 🕵️‍♂️ 👉 CIA beats deepseek v3.2 on our evaluations (4/8)
4mo
4mo
4mo
💡 Key idea: 👉 Use the change in the agent’s belief about the correct answer as a dense intrinsic reward. If an action increases: log p(target | history) → reward it. We call this ∆Belief-RL. No critic. No process reward model. Just the agent judging its own progress. (2/8)
👉 Continues to seek information beyond the training horizon The results suggest that ∆Belief rewards generalize to longer horizons; they teach general information-seeking strategies that continue to resolve uncertainty as more evidence becomes available. 🔎 (5/8)
4mo
Blogpost: bethgelab.github.io/delta-belief... Paper: alphaxiv.org/abs/intrinsi... Code: github.com/bethgelab/de... A massive thanks to my collaborators Ilze Amanda Auzina, Sergio Hernández Gutiérrez, Shashwat Goel, @bayesiankitten.bsky.social and Matthias Bethge (8/8) @bethgelab.bsky.social
Joschka Strüber @Tuebingen AI Center🇩🇪
4mo
4mo
4mo
🚨 New paper alert! 🚨 We’ve just launched openretina, an open-source framework for collaborative retina modeling across datasets and species. A 🧵👇 (1/9)
Joschka Strüber @Tuebingen AI Center🇩🇪
Joschka Strüber @Tuebingen AI Center🇩🇪
Joschka Strüber @Tuebingen AI Center🇩🇪
🚀 We're hiring! The @ellisinsttue.bsky.social leads the AI development for Germany’s new open-source nationwide Adaptive Intelligent System learning platform for schools (as part of a consortium led by Assecor & KI macht Schule, and mandated by the FWU). 👉 Apply now: forms.gle/XmLkwEDD45fY...
Mar 14, 2025
Joschka Strüber @Tuebingen AI Center🇩🇪
Joschka Strüber @Tuebingen AI Center🇩🇪
6mo
Joschka Strüber @Tuebingen AI Center🇩🇪
Joschka Strüber @Tuebingen AI Center🇩🇪
Wieland Brendel
Federico D’Agostino