Inlay

I'm part of this! There's also a paper: arxiv.org/abs/2503.10267

** New parallel data set ** . We've just released HPLT v2.0, a parallel data set of 50 languages paired with English, 380M sentence pairs in total. Extracted from the Internet Archive and Common Crawl hplt-project.org/datasets/v2.0

A space that combines petabytes of natural language data with large-scale model training