Inlay

//

ProfilePosts

Loading...

Zang et al., "World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible" A Diffusion Transformer that estimates multiple layers of depth to further estimate occluded parts as well.

19h

metricscenes.github.io

Zeng et al., “i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models” A fully reproducible recipe & code & weights and everything for a truly open text-to-image model. A LOT of interesting findings.

Xiangli et al., "Honey, I Shrunk the Arc de Triomphe!" Metric depth estimators aren't actually metric. With curated, scaled data, they can be adapted to be better.

2dlfm.github.io

Esmati and Nath et al., "The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show" You can use inversion to retrieve feature representations for a video, which can be linearly decoded into physical plausibility -- if you use enough steps not shortcuts

4d

Dabhi and Gill et al., "2D-LFM: Lifting Foundation Model without 3D Supervision" Simply using transformers to do 2D-to-3D lifting of 2D landmarks fails by construction due to the permutation equivariance of the architecture -- inject positional encoding in multiple layers to fix

2d

arxiv.org/abs/2606.05328

4d

5d

3d

haoz19.github.io/world-tracin...

3d

zlab-princeton.github.io/i1/

5d

19h

Kwang Moo Yi

2d

Kwang Moo Yi

Video

metricscenes.github.io

MetricScenes

Project page for 2D-LFM: Lifting Foundation Model without 3D Supervision.

2dlfm.github.io

2D-LFM: Lifting Foundation Model without 3D Supervision

arxiv.org

Modern video diffusion models generate increasingly realistic and temporally coherent videos, motivating their use as candidate world simulators. Yet it remains unclear whether these models internally...

The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show

Video

Generative Pixel-Aligned Geometry Beyond the Visible

haoz19.github.io

World Tracing — Generative Pixel-Aligned Geometry Beyond the Visible

Project page for i1, a simple and fully open recipe for strong text-to-image models.

zlab-princeton.github.io

i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models