//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfileReplies









Loading...
Zang et al., "World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible" A Diffusion Transformer that estimates multiple layers of depth to further estimate occluded parts as well.
metricscenes.github.io
haoz19.github.io/world-tracin...
zlab-princeton.github.io/i1/
Zeng et al., “i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models” A fully reproducible recipe & code & weights and everything for a truly open text-to-image model. A LOT of interesting findings.
2dlfm.github.io
Dabhi and Gill et al., "2D-LFM: Lifting Foundation Model without 3D Supervision" Simply using transformers to do 2D-to-3D lifting of 2D landmarks fails by construction due to the permutation equivariance of the architecture -- inject positional encoding in multiple layers to fix
Xiangli et al., "Honey, I Shrunk the Arc de Triomphe!" Metric depth estimators aren't actually metric. With curated, scaled data, they can be adapted to be better.
arxiv.org/abs/2606.05328
Esmati and Nath et al., "The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show" You can use inversion to retrieve feature representations for a video, which can be linearly decoded into physical plausibility -- if you use enough steps not shortcuts
19h
4d
19h
2d
3d
3d
4d
2d
Video
5d
5d
Video
Project page for 2D-LFM: Lifting Foundation Model without 3D Supervision.
2dlfm.github.io
2D-LFM: Lifting Foundation Model without 3D Supervision
metricscenes.github.io
Generative Pixel-Aligned Geometry Beyond the Visible
MetricScenes
World Tracing — Generative Pixel-Aligned Geometry Beyond the Visible
Project page for i1, a simple and fully open recipe for strong text-to-image models.
zlab-princeton.github.io
Modern video diffusion models generate increasingly realistic and temporally coherent videos, motivating their use as candidate world simulators. Yet it remains unclear whether these models internally...
arxiv.org
haoz19.github.io
The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show
i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models
Kwang Moo Yi
Kwang Moo Yi
Kwang Moo Yi
Kwang Moo Yi
Kwang Moo Yi
Kwang Moo Yi
Kwang Moo Yi
Kwang Moo Yi
Kwang Moo Yi
Kwang Moo Yi