Inlay

We just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use. huggingface.co/datasets/coral-nlp/german-commons

We’re on a journey to advance and democratize artificial intelligence through open source and open science.