š How it works:
⢠Embed camera trajectories into diffusion noise, zero extra modules
⢠3D rewards from Depth Anything 3 + Qwen3-VL as geometry critics
⢠Periodic decoupled training: buildings stay rigid, flags still wave šļøš©
⢠3K text prompts only, no video data
Weijie Wang