For full technical details + compliance Datasheet see our preprint @ arxiv.org/abs/2510.13996
As for German-specific models trained on this data... stay tuned 👀
Large language model development relies on large-scale training corpora, yet most contain data of unclear licensing status, limiting the development of truly open models. This problem is exacerbated f...