//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Compute-to-train loss scaling laws guide LLM pretraining, but how do training/val losses map to downstream task loss? What factors shape these laws? We analyze loss-to-loss scaling laws, extending prior work beyond a single architectural setting to a number of configurations. 2/8