I'm part of this! There's also a paper: arxiv.org/abs/2503.10267
We also release MultiHPLT, with a total of 1275 language pairs (not English-centric). opus.nlpl.eu/MultiHPLT/en...
We will shortly be adding document alignments to this data. This means that we will release sets of aligned complete documents, with additional details of the sentence alignments that we found in the documents.
** New parallel data set ** . We've just released HPLT v2.0, a parallel data set of 50 languages paired with English, 380M sentence pairs in total. Extracted from the Internet Archive and Common Crawl hplt-project.org/datasets/v2.0
EAMT best thesis award - closes on January 31st. Completed an MT-related PhD in 2024? In Europe, Africa or Middle East. Then why not submit your thesis. eamt.org/2024/11/28/t...
MT Summit 2025 - deadline extended!
The deadline for all papers (technical/user/translator/products/projects) has been extended to February 10th. MT Summit will be in Geneva, June 23--27. mtsummit2025.unige.ch/index.html
Can't believe WMT General MT shared task is 20 years old!
A space that combines petabytes of natural language data with large-scale model training
Big news from WMT! ๐ We are expanding beyond MT and launching a new multilingual instruction shared task. Our goal is to foster truly multilingual LLM evaluation and best practices in automatic and human evaluation. Join us and build the winning multilingual system!
www2.statmt.org/wmt25/multil...
Barry Haddow
Call for participation: We just opened the registration for this year's MT Marathon in August in Helsinki, Finland: blogs.helsinki.fi/language-tec..., featuring:
- Ayodele Awokoya
- Wilker Aziz
- Marta Costa-Jussa
- Barry Haddow
- Amit Moryosse
- Sara Papi
- Jรถrg Tiedemann
- Marco Turchi
Barry Haddow
Barry Haddow
** New parallel data set ** . We've just released HPLT v2.0, a parallel data set of 50 languages paired with English, 380M sentence pairs in total. Extracted from the Internet Archive and Common Crawl hplt-project.org/datasets/v2.0
Barry Haddow
Barry Haddow
Barry Haddow
blogs.helsinki.fi
A space that combines petabytes of natural language data with large-scale model training