//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Specifically, from *generating rollouts*. RL trains on long traces (up to 32k tokens, avg >10k) across many iterations. The generator runs at near-peak power; the trainer idles ~75% of the time waiting for rollouts, so 87% of Think post-training energy goes to generation.