How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀
🚨 Enter ∆Belief-RL: We show how to use agent’s own belief updates as a dense reward for turn-level credit assignment.
The result? Surprisingly strong generalization.
(1/8) 🧵⬇️