Check out our new work: MIRO
No more post-training alignment!
We integrate human alignment right from the start, during pretraining!
Results:
β¨ 19x faster convergence β‘
β¨ 370x less compute π»
π Explore the project: nicolas-dufour.github.io/miro/
π Our work MIRO is accepted to #ICML2026 @icmlconf.bsky.social
We integrate human preferences directly during pretraining with multi-reward conditioning.
β‘MIRO is 19x faster than baselines and 370x cheaper at inference!
π€ Try out the models: huggingface.co/spaces/nicol...
See you in Seoul π°π· !
Very proud of our recent work, kudos to the team! Read @davidpicard.bsky.socialβs excellent post for more details or the paper arxiv.org/pdf/2502.21318
We introduce MIRO: a new paradigm for T2I model alignment integrating reward conditioning into pretraining, eliminating the need for separate fine-tuning/RL stages. This single-stage approach offers unprecedented efficiency and control.
- 19x faster convergence β‘
- 370x less FLOPS than FLUX-dev π
Thrilled to share that MIRO is accepted to ICML 2026 @icmlconf.bsky.social ! π
By training on the reward scores, we can simply condition the model on high rewards at inference time to guarantee top-tier, aligned outputs.
Weβve updated our paper with some additional results!
π arxiv.org/abs/2510.25897
Thread with all details coming soon!
Thrilled to share that MIRO is accepted to ICML 2026 @icmlconf.bsky.social ! π
By training on the reward scores, we can simply condition the model on high rewards at inference time to guarantee top-tier, aligned outputs.
Weβve updated our paper with some additional results!
The default paradigm of post-training text-to-image generators includes post-hoc selection of generated images, and subsequent training with one reward model to align the generator to the reward, typi...
Train once, align many rewards. MIRO achieves 19Γ faster convergence and 370Γ less compute than FLUX while reaching GenEval score of 75. Controllable trade-offs at inference time.
π Folks! If you are curious about the Generative Modeling via Drifting paper, but you find it difficult to understand β I wrote a different interpretation of it.
It's called: "An Expectation-Maximization interpretation of Generative Modeling via Drifting"
davidpicard.github.io/pdf/An_Expec...
π¨ arxiv.org/abs/2604.06129
PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer
This paper is the result of doing a lab-wide hackathon on an idea I've had for some time. Probably the paper with the highest number of authors I've ever done.
It's a CVPR Findings 26.
Thread π§΅π
Train once, align many rewards. MIRO achieves 19Γ faster convergence and 370Γ less compute than FLUX while reaching GenEval score of 75. Controllable trade-offs at inference time.
nicolas-dufour.github.io
Final note: I'm (we're) tempted to organize a challenge on that topic as a workshop at a CV conf. ImageNet is the only source of images allowed and then you compete to get the bold numbers.
Do you think there would be people in for that? Do you think it would make for a nice competition?
We introduce MIRO: a new paradigm for T2I model alignment integrating reward conditioning into pretraining, eliminating the need for separate fine-tuning/RL stages. This single-stage approach offers unprecedented efficiency and control.
- 19x faster convergence β‘
- 370x less FLOPS than FLUX-dev π
David Picard
Nicolas Dufour
The default paradigm of post-training text-to-image generators includes post-hoc selection of generated images, and subsequent training with one reward model to align the generator to the reward, typi...
This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a comp...
arxiv.org
We introduce MIRO: a new paradigm for T2I model alignment integrating reward conditioning into pretraining, eliminating the need for separate fine-tuning/RL stages. This single-stage approach offers unprecedented efficiency and control.
- 19x faster convergence β‘
- 370x less FLOPS than FLUX-dev π