Inlay

How can an agent improve by leveraging a learned world model without being misled by model errors? I wrote a blog post about our #ICLR2026 paper, Deep SPI, where I explain the main idea behind safe policy improvement in RL via world models. -> delgrange.me/post/deep_spi/

A long-form explainer of Deep SPI: why ordinary on-policy auxiliary losses break after policy updates, how world models and neighborhood-constrained updates fix this, and what the resulting algorithm ...