Inlay

Profile

PhD student @ UW, research @ Ai2

Jacob Morrison

Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")? Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior! shorturl.at/siUYI%F0%9F%...

Full per-stage breakdown, methodology, and discussion in the paper: arxiv.org/abs/2605.01158.

8mo

1mo

Most LLM environmental reporting covers only the final pretraining runs. For Olmo 3, we measured every stage across all four variants: 7B and 32B, instruct and reasoning, and found that 82% of the compute went to development, all before the final runs 😱

Across the full pipeline we estimate ~4,251 tCO2eq and ~15,887 kL of water for the Olmo 3 series, which is equal to nearly 850 US homes' annual electricity, or 140 years of water use for the average person in the US.

Jacob Morrison

Kunal Jha

We also estimated our total water use: ~15,887 kL, none of which was from datacenter cooling. Our cluster uses closed-loop cooling, so all of it came from power generation. Just changing the grid would more than double it, and evaporative cooling would nearly double it again:

That 82% is up from the ~50% we previously reported for Olmo 1 and 2 pretraining: arxiv.org/abs/2503.05804

Specifically, from *generating rollouts*. RL trains on long traces (up to 32k tokens, avg >10k) across many iterations. The generator runs at near-peak power; the trainer idles ~75% of the time waiting for rollouts, so 87% of Think post-training energy goes to generation.

Reasoning models are far more expensive to post-train. For our 32B model, post-training our Think model takes 17x more datacenter energy than post-training the Instruct variant, and almost all of that gap is reinforcement learning.

Every stage has its own experimentation cycle, with dev fractions running from 69% (pretraining) to >95% (mid-training, DPO). Concurrent work from Epoch AI estimates frontier labs at 77–90%: epoch.ai/gradient-upd...

1mo

Thank you to co-authors @natolambert.bsky.social, @valentinapy.bsky.social, @jacobcares.bsky.social, Sander Land, @nlpnoah.bsky.social, @hanna-nlp.bsky.social! Read more in the paper here (ArXiv soon!): github.com/allenai/rewa... Dataset, leaderboard, and models here: huggingface.co/collections/...

1mo