//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
How can an agent improve by leveraging a learned world model without being misled by model errors? I wrote a blog post about our #ICLR2026 paper, Deep SPI, where I explain the main idea behind safe policy improvement in RL via world models. -> delgrange.me/post/deep_spi/
1mo
A long-form explainer of Deep SPI: why ordinary on-policy auxiliary losses break after policy updates, how world models and neighborhood-constrained updates fix this, and what the resulting algorithm ...
delgrange.me
Deep SPI: Safe Policy Improvement via World Models | Florent Delgrange
Florent Delgrange