Inlay

Reasoning models are far more expensive to post-train. For our 32B model, post-training our Think model takes 17x more datacenter energy than post-training the Instruct variant, and almost all of that gap is reinforcement learning.