It's not just for training from scratch. Our new results show that MIRO functions beautifully as a post-training framework!
Applying this multi-reward conditioning during the fine-tuning phase of an existing base model yields the exact same controllable alignment.