Inlay

Why does MIRO work so reliably? Our paper introduces a theorem proving that conditioning on the joint reward distribution mathematically guarantees that the model steers toward high-reward regions while preserving sample diversity and avoiding single-metric hacking.