From an implementation standpoint, backprop through multiple steps of the sampling process turns out to be expensive (as always). I'm glad that we finally have a working implementation of activation checkpointing to solve the GPU memory bottleneck with Flux.2 github.com/anneharringt...
Code for “It’s never too late: Noise optimization for collapse recovery in trained diffusion models”, CVPR 2026 - anneharrington/divgen