Inlay

Profile

New paper 🚨 #ICLR26 Most world models predict the future from a past trajectory. But neuroscience suggests that such inference can instead be made from temporally independent experiences. We built the Episodic Spatial World Model (ESWM), a model that does exactly this: Video abstract [1/2]

[5/8] ESWM also supports efficient exploration by acting on uncertainty to collect experiences and navigate between states.

Video abstract [2/2]

[6/8] When environments change (e.g., new obstacles), ESWM adapts by updating its temporally and spatially independent memories. No retraining is needed.

[1/8] Existing world models rely on a sequence of observations to predict future states. This leads to: 1) redundancy due to temporal overlap (contexts grow for large envs), 2) limited adaptability when environments change due to temporal dependency.

[4/8] In GridWorld experiments: 1) Transformer >> LSTM & Mamba. 2) ESWM generalizes to novel observations and structures. 3) Its latent space reflects the environment structure. 4) It predicts by integrating independent transitions.

[7/8] Beyond Grid World, ESWM is scalable to the more complex MiniGrid (high-dimensional observation) and 3D indoor scenes ProcThor (realistic pixel observations).

[8/8] We believe ESWM points to a new generation of brain-inspired models—ones that reason over fragments, generalize across structure, and adapt efficiently to change. 👥W/ @maximemdaigle.bsky.social, @bashivan.bsky.social Read the full paper: arxiv.org/abs/2505.13696

[2/8] In contrast, neuroscience evidence suggests that animals can build spatial representation across independent experiences (i.e day1: A->B, day2: B->C, day3: infers A->C). Motivated by these observations, we introduce ESWM:

[3/8] ESWM is designed to operate on sets of temporally independent transitions. Given such a set, it infers unseen transitions. The model is meta-trained across environments to support generalization. We show three settings in which we validate ESWM.

3mo

Video

3mo

Herbie(Zizhan) He