PQN, a recently introduced value-based method (bsky.app/profile/matt...) has a similar data-collection as PPO. Although we see a similar trend as with PPO, but much less pronounced. It is possible our findings are more correlated with policy-based methods.
9/
Pablo Samuel Castro
Super excited to share our paper, Simplifying Deep Temporal Difference Learning has been accepted as a spotlight at ICLR! My fab collaborator Matteo Gallici and I have written a three part blog on the work, so stay tuned for that! :)
@flair-ox.bsky.social
arxiv.org/pdf/2407.04811