I don’t know why it took my years to realize that airlines let you just bring lunch on a plane! Why did I go hungry for so long in domestic flights???
(Yes, I promised myself I would try to use more hype-y Gen Z style emojis 👨🦳)
💡The insight? Regularizing the entropy of the policy carefully, we can guarantee consistent policies across different independent retraining runs, without access to information of the other runs!
For details check out @marcelhussing.bsky.social's post and the linked paper.
👀 Check out @marcelhussing.bsky.social and @liv-daliberti.bsky.social's amazing new work on behavioral consistency!
❓The challenge? Retraining an RL agent might give you a completely different policy than before! This makes everything harder, as we never quite know whether we simply got unlucky 🤔
Got a great TMLR paper but missed the RLC deadline?
Following last year’s success, @RL_Conference is back with a Journal-to-Conference track! Accepted TMLR papers within scope are invited to submit for consideration.
Please submit here: docs.google.com/forms/d/e/1F...