//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
New preprint out! ๐ŸŽ‰ How does LLM training loss translate to downstream performance? We show that pretraining data and tokenizer shape loss-to-loss scaling, while architecture and other factors play a surprisingly minor role! brendel-group.github.io/llm-line/ ๐Ÿงต1/8
Feb 18, 2025
Prasanna Mayilvahanan