Inlay

Specifically, from *generating rollouts*. RL trains on long traces (up to 32k tokens, avg >10k) across many iterations. The generator runs at near-peak power; the trainer idles ~75% of the time waiting for rollouts, so 87% of Think post-training energy goes to generation.