What's better, (iii) straightening encourages the latent Euclidean distance to better align with the geodesic distance; (iv) near-perfect reconstruction can be attained with a very low feature dimensionality (we can reduce embedding dimension from 384-->8!)
More details can be found in our paper arxiv.org/abs/2603.12231. Many hanks to my collaborators @oumayma @gaoyuezhou.bsky.social @randall @timrudner.bsky.social and amazing advisors @yann-lecun.bsky.social @mengyer.bsky.social for their guidance and support ššš
What is a good latent space for world modeling and planning? š¤
Inspired by the perceptual straightening hypothesis in human vision, we introduce temporal straightening to improve representation learning for latent planning.
š: agenticlearning.ai/temporal-str...
Inspired by the perceptual straightening hypothesis ( human visual systems transform natural videos into straighter internal representations), we introduce a simple fix: jointly learn an encoder & a predictor (JEPA-style) with regularization on curvatures of latent trajectories.
Large-scale visual pretraining is useful but NOT enough! It's not tailored to the dynamics of the environment and retains many planning-irrelevant low-level details. e.g. In DINOv2 feature space, the latent trajectories are curved & L2 distances don't reflect geodesic distances.
The resulting embedding space has many good properties! We find that (i) implicit straightening can happen when training the encoder using the predictor loss alone; (ii) adding straightening regularization further decreases curvature of the resulting embeddings;
Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contai...
Straightening also makes the loss landscape closer to convex and better conditioned, improving gradient-based planning. We test on four goal-reaching tasks and observe a significant boost in open-loop and MPC success rate using gradient descent.