//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...







Digital humanities researchers often care about fine-grained similarity based on narrative elements like plot or tone, which don’t necessarily correlate with surface-level textual features. Can embedding models capture this? We study this in the context of fanfiction!
I’ll be presenting this work in **2 hours** at EMNLP’s Gather Session 3. Come by to chat about fanfiction, literary notions of similarity, long-context modeling, and consent-focused data collection!
7mo
7mo
We introduce FicSim, a dataset of 90 recently written long-form fanfics from Archive of Our Own. We *reach out to the authors for permission* to use each work and prioritize continual, informed author consent. Fics range in length from 10K to 400K+ words.
Natasha Johnson
Natasha Johnson
Digital humanities researchers often care about fine-grained similarity based on narrative elements like plot or tone, which don’t necessarily correlate with surface-level textual features. Can embedding models capture this? We study this in the context of fanfiction!
Looking back and forth between Barthes, Sedgwick, and Hirsch trying to interpret a Star Trek scene when I'm 90% sure the explanation is just "the actor had a crush on his costar"
7mo
7mo
All selected fanfiction has detailed metadata and author-generated tags describing the fanfic content. Informed by fan studies and digital humanities literature, we classify these into 12 categories to construct gold labels for a fine-grained semantic similarity task.
Even strong embedding models over-index on surface features—for every model tested, similarity scores are more reflective of author or fandom than semantic aspects like theme or characterization. This is true even if models are explicitly instructed to focus on these aspects!
3mo
Unsurprising: Using longer words makes female authors more “literary” Surprising: The opposite is true for male authors For more cool plots + findings, take a look at my #CHR2025 paper exploring the role of form vs gender in the classification of genre & literary fiction doi.org/10.63744/Ztw...
7mo
This was joint work with @abertsch.bsky.social, Maria-Emil Deal, and @strubell.bsky.social Paper: arxiv.org/abs/2510.20926 Dataset: huggingface.co/datasets/fic...
7mo
7mo
Natasha Johnson
7mo
Natasha Johnson
Natasha Johnson
Natasha Johnson
Natasha Johnson
Natasha Johnson
Natasha Johnson