A series of state-of-the-art, open source and transparent
foundation models for European languages
Open Euro LLM
Loading...
As of this morning:
π§ 425.49B tokens seen
π 4.25% completed
This eager reader wants more input, one token at a time.
Follow along. π
(2/2)
#PreTraining #LLM #MultilingualAI #TransparentAI
#goOpenEuroLLM
Pretraining launched!π
Our 9B/10TT baby model is making its first steps in Leonardo (CINECA). π£
All people involved are eager to see the results of the effort it took to get here and share them. π
And advancing to push hard for the next cycle. π¦Ύ
#goOpenEuroLLM
Also, today, know more about bechmark contamination impact goint to the poster of our colleagues from the unversities of Helsinki and Turku and the ELLIS Institute Finland.
HPLT is of the datasets we are sharing in our world-readable catalogue across HPCs. Interesting talk at #LREC2026 in 15 min in room Menorca 1 at 16:20!!!
Wrapping up our 3rd general meeting, hosted by
AI Sweden in sunny Stockholm βοΈ
A full room makes the final decisions before training the first OpenEuroLLM model. Sharing updates, ideas, and future plans.
Two more days of tight collaboration.
Full speed mode. π
#goOpenEuroLLM
Quite a nice "representation" of the OpenEuroLLM crowd will be at the International Conference on Learning Representations (ICLR) this week.
On Friday 24, come to poster "OpenThoughts: Data Recipes for Reasoning Models", work partially supported by our project, and meet us! π
Input, more input π€β‘
Just like Jonny 5 in Short Circuit, our baby model is reading every single token from its pretraining dataset.
So far: 10 trillion tokens, 36 languages + code & math as their own "languages" πππ»
Weβre tracking progress & sharing it openly π
(1/2)
Experimenting with model-based annotation for better data selection? A candidate to consider is propella-1, a mulitlingual and multi-property annotator partially funded by #OpenEuroLLM which is fully open-source.
π Models, annotations and paper ready! See: huggingface.co/collections/...
All ready to share information about #OpenEuroLLM with the #LREC2026 crowd. Let's talk data, infra, evals and open multilingual LLM models together! Come to booth #5 at the poster area 1, Elyxir Building.
#multingualLLMs #openLLMs #diverseLLMs #safeLLMs
π One year of OpenEuroLLM!
πͺπΊWeβre building Europeβs next-gen open-source LLMs to boost digital sovereignty.
More about our achievements and next steps for infrastructure, data, models and evaluation at openeurollm.eu/blog/first-y....
Year 2 = full speed ahead. π
Go #OpenEuroLLM!