Inlay

How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀 🚨 Enter ∆Belief-RL: We show how to use agent’s own belief updates as a dense reward for turn-level credit assignment. The result? Surprisingly strong generalization. (1/8) 🧵⬇️