//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
For full technical details + compliance Datasheet see our preprint @ arxiv.org/abs/2510.13996 As for German-specific models trained on this data... stay tuned 👀
7mo
Large language model development relies on large-scale training corpora, yet most contain data of unclear licensing status, limiting the development of truly open models. This problem is exacerbated f...
arxiv.org
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models
Webis Group