//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Check out our newest paper! As always, it was super fun working on this with @prasannamayil.bsky.social
Feb 18, 2025
Thaddäus Wiedemer
New preprint out! 🎉 How does LLM training loss translate to downstream performance? We show that pretraining data and tokenizer shape loss-to-loss scaling, while architecture and other factors play a surprisingly minor role! brendel-group.github.io/llm-line/ 🧵1/8
Feb 18, 2025
Prasanna Mayilvahanan