Inlay

📢Thrilled to introduce ATLAS 🗺️: the largest multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍 Is scaling diff by lang? 🧙‍♂️ Can we model the curse of multilinguality? ⚖️ Pretrain vs finetune from checkpoint? 🔀 X-lingual transfer scores across langs? 1/🧵