2/2 🚀 Our new paper below tackles two major issues of high online sample complexity and lack of online performance guarantees in offline RL, obtaining accurate regret estimation and achieving competitive performance with the best online hyperparameter tuning methods, both
using only offline data! 👇