Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?
Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior!
shorturl.at/siUYI%F0%9F%...
Full per-stage breakdown, methodology, and discussion in the paper: arxiv.org/abs/2605.01158.
Most LLM environmental reporting covers only the final pretraining runs. For Olmo 3, we measured every stage across all four variants: 7B and 32B, instruct and reasoning, and found that 82% of the compute went to development, all before the final runs 😱
Across the full pipeline we estimate ~4,251 tCO2eq and ~15,887 kL of water for the Olmo 3 series, which is equal to nearly 850 US homes' annual electricity, or 140 years of water use for the average person in the US.
Jacob Morrison
Kunal Jha
We also estimated our total water use: ~15,887 kL, none of which was from datacenter cooling. Our cluster uses closed-loop cooling, so all of it came from power generation. Just changing the grid would more than double it, and evaporative cooling would nearly double it again:
That 82% is up from the ~50% we previously reported for Olmo 1 and 2 pretraining: arxiv.org/abs/2503.05804
Specifically, from *generating rollouts*. RL trains on long traces (up to 32k tokens, avg >10k) across many iterations. The generator runs at near-peak power; the trainer idles ~75% of the time waiting for rollouts, so 87% of Think post-training energy goes to generation.
Reasoning models are far more expensive to post-train. For our 32B model, post-training our Think model takes 17x more datacenter energy than post-training the Instruct variant, and almost all of that gap is reinforcement learning.
Every stage has its own experimentation cycle, with dev fractions running from 69% (pretraining) to >95% (mid-training, DPO). Concurrent work from Epoch AI estimates frontier labs at 77–90%: epoch.ai/gradient-upd...
Thank you to co-authors @natolambert.bsky.social, @valentinapy.bsky.social, @jacobcares.bsky.social, Sander Land, @nlpnoah.bsky.social, @hanna-nlp.bsky.social!
Read more in the paper here (ArXiv soon!): github.com/allenai/rewa...
Dataset, leaderboard, and models here: huggingface.co/collections/...