Inlay

Inspired by the perceptual straightening hypothesis ( human visual systems transform natural videos into straighter internal representations), we introduce a simple fix: jointly learn an encoder & a predictor (JEPA-style) with regularization on curvatures of latent trajectories.