//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
6/ Paper, dataset, and models here: arxiv.org/abs/2606.19468 huggingface.co/collections/... github.com/johnsont4/na...
3d
The narrative composition of web-scale LLM pretraining corpora remains largely unexplored even though narrative is a fundamental mode of human communication. We present the first fine-grained study of...
arxiv.org
Characterizing Narrative Content in Web-scale LLM Pretraining Data
Teagan Johnson