🚀 Introducing PICLe: a framework for in-context named-entity detection (NED) using pseudo-annotated demonstrations.
🎯 No human labeling needed—yet it outperforms few-shot learning with human annotations!
#AI #NLProc #LLMs #ICL #NER
Lots of great news out of the EPFL NLP lab these last few weeks. We'll be at @iclr-conf.bsky.social and @naaclmeeting.bsky.social in April / May to present some of our work in training dynamics, model representations, reasoning, and AI democratization. Come chat with us during the conference!
1/ 🌍 How does mixing data from hundreds of languages affect LLM training?
In our new paper "Revisiting Multilingual Data Mixtures in Language Model Pretraining" we revisit core assumptions about multilinguality using 1.1B-3B models trained on up to 400 languages.
🧵👇