This was a very fun project. In Behavior-Consistent Deep RL, we provide a method that aligns the behavior of independently trained policies. It turns out, this works even in high dimensional spaces. Here are 6 seeds of Humanoids (all ca same return). (left) Baseline (right) Ours.
Marcel Hussing
🚨 New Preprint Alert: Behavior-Consistent Deep Reinforcement Learning 🚨
TLDR: We introduce an approach that achieves behavioral similarity across independent algorithm executions in continuous state-action space deep RL.