Inlay

Profile

Postdoc at Kyutai http://nicolas-dufour.github.io

Nicolas Dufour

Surflo: Consistent 3D Surface Flow Model with Global State @antoine-guedon.bsky.social, Shu Nakamura, @nicolasdufour.bsky.social, Jiahui Lei, Ko Nishino, @akanazawa.bsky.social arxiv.org/abs/2606.13644

We introduce MIRO: a new paradigm for T2I model alignment integrating reward conditioning into pretraining, eliminating the need for separate fine-tuning/RL stages. This single-stage approach offers unprecedented efficiency and control. - 19x faster convergence ⚡ - 370x less FLOPS than FLUX-dev 📉

Thrilled to share that MIRO is accepted to ICML 2026 @icmlconf.bsky.social ! 🎉 By training on the reward scores, we can simply condition the model on high rewards at inference time to guarantee top-tier, aligned outputs. We’ve updated our paper with some additional results!

7mo

24d

Zhenjun Zhao

Nicolas Dufour

The default paradigm of post-training text-to-image generators includes post-hoc selection of generated images, and subsequent training with one reward model to align the generator to the reward, typi...

arxiv.org

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

Open-source fueled the LLM revolution, but Physical AI hasn't fully benefited from this flywheel yet. Today, we're launching kesai.eu, our mission to democratize robotics research! First milestone: training a frontier-level self-driving policy using significantly less data than typically required.

It's not just for training from scratch. Our new results show that MIRO functions beautifully as a post-training framework! Applying this multi-reward conditioning during the fine-tuning phase of an existing base model yields the exact same controllable alignment.

Why does MIRO work so reliably? Our paper introduces a theorem proving that conditioning on the joint reward distribution mathematically guarantees that the model steers toward high-reward regions while preserving sample diversity and avoiding single-metric hacking.

Are all rewards useful? Yes! Our new "leave-one-out" ablation shows that removing even a single reward drops performance. Even though these rewards are quite entangled, each one still provides unique, useful bits of information that the model needs to succeed.

Everything is fully open-sourced, including the codebase, the model + all individual single reward model variants! 🌐 Site: nicolas-dufour.github.io/miro 📄 Paper: arxiv.org/abs/2510.25897 🛠️ Git: github.com/nicolas-dufo... 🤗 HF: huggingface.co/nicolas-dufo... 🎨 Demo: huggingface.co/spaces/nicol...

24d

7mo

The model is so fast and easy to use that I vibe-coded a small game with it in 1h 😅 Runs flawlessly on a consumer GPU if you're looking for a small local model to tinker with.

Nicolas Dufour