PhD student @ CU Boulder studying NLP and cultural analytics
https://johnsont4.github.io/
Teagan Johnson
Loading...
1/
LLMs learn narrative from their pretraining data but what narrative content is actually in there? It turns out narrative is wildly unevenly distributed across sources and topics. New preprint with @andrewpiper.bsky.social @elliottash.bsky.social @mariaa.bsky.social:
Also check out the Huggingface page to test out the Agency and Setting elements of NarraBERT! You can enter any text you'd like and it will return the feature values across the 9 agency+setting dimensions: huggingface.co/spaces/teagr...
What should academics be doing right now?
I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.
davidbau.github.io/poetsandnurs...
It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...
2/
Narrative isn't binary. Drawing on narrative theory, we treat narrativity as a continuous, multidimensional property and score text on 12 dimensions across agency, setting, and eventful features.
3/
To do this at scale we built NarraBert, a RoBERTa-based model validated against human and LLM annotations, and released NarraDolma: ~2.9M passages across ~785K Dolma documents, each scored on all 12 narrative dimensions.
5/
And variation within a source is large, so source-level labels are too coarse to capture it. Upweighting a "narrative" source like Reddit boosts interiority while starving the model of storyworld texture and events. Curation choices have narrative consequences current pipelines don't measure.
4/
The headline: sources and topics have distinct narrative fingerprints. Reddit is dense with interiority but low on conflict while “Crime & Law” texts are the reverse. No single source covers the whole narrative spectrum, not even books!
As you may be aware, the Trump administration is planning to dismantle NCAR. I can't say enough about how pivotal NCAR is to worldwide public safety, and also to the mentorship of future scientists.
Today is the NSF-imposed deadline for submitting public feedback to congresspeople (see link below).
Please consider submitting feedback here: secure.ucs.org/a/2025-prote...
You can use the template message or customize it. It takes less than 1 minute to submit. You can learn more from this recent NYT article: www.nytimes.com/2026/03/13/c...
6/
Paper, dataset, and models here: arxiv.org/abs/2606.19468
huggingface.co/collections/...
github.com/johnsont4/na...