//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀 🚨 Enter ∆Belief-RL: We show how to use agent’s own belief updates as a dense reward for turn-level credit assignment. The result? Surprisingly strong generalization. (1/8) 🧵⬇️
4mo
Joschka Strüber @Tuebingen AI Center🇩🇪