New paper 🚨 #ICLR26
Most world models predict the future from a past trajectory. But neuroscience suggests that such inference can instead be made from temporally independent experiences.
We built the Episodic Spatial World Model (ESWM), a model that does exactly this:
Video abstract [1/2]
[5/8] ESWM also supports efficient exploration by acting on uncertainty to collect experiences and navigate between states.
Video abstract [2/2]
[6/8] When environments change (e.g., new obstacles), ESWM adapts by updating its temporally and spatially independent memories. No retraining is needed.
[1/8] Existing world models rely on a sequence of observations to predict future states. This leads to: 1) redundancy due to temporal overlap (contexts grow for large envs), 2) limited adaptability when environments change due to temporal dependency.
[4/8] In GridWorld experiments: 1) Transformer >> LSTM & Mamba. 2) ESWM generalizes to novel observations and structures. 3) Its latent space reflects the environment structure. 4) It predicts by integrating independent transitions.
[7/8] Beyond Grid World, ESWM is scalable to the more complex MiniGrid (high-dimensional observation) and 3D indoor scenes ProcThor (realistic pixel observations).
[8/8] We believe ESWM points to a new generation of brain-inspired models—ones that reason over fragments, generalize across structure, and adapt efficiently to change.
👥W/ @maximemdaigle.bsky.social, @bashivan.bsky.social
Read the full paper: arxiv.org/abs/2505.13696
[2/8] In contrast, neuroscience evidence suggests that animals can build spatial representation across independent experiences (i.e day1: A->B, day2: B->C, day3: infers A->C). Motivated by these observations, we introduce ESWM:
[3/8] ESWM is designed to operate on sets of temporally independent transitions. Given such a set, it infers unseen transitions. The model is meta-trained across environments to support generalization. We show three settings in which we validate ESWM.