Inlay

Check out our newest paper! As always, it was super fun working on this with @prasannamayil.bsky.social

New preprint out! 🎉 How does LLM training loss translate to downstream performance? We show that pretraining data and tokenizer shape loss-to-loss scaling, while architecture and other factors play a surprisingly minor role! brendel-group.github.io/llm-line/ 🧵1/8