Inlay

ProfileReplies

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀? ✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.

The “LLM vibe” is real even when the actual content is different. Across several genres from creative writing to obituaries, different LLMs generate homogenous discourse compared to humans.

Check out our paper for more! 📝: arxiv.org/abs/2606.12790 🐙: github.com/AlliteraryAl... 🔗: alliteraryalligator.github.io/GENIE/ This was in collaboration with: @manyawadhwa.bsky.social @asher-zheng.bsky.social @gregdnlp.bsky.social and @jessyjli.bsky.social

Apr 21, 2025

Are LLM-generated stories novel? They can have unique characters and cliché plots, or the other way around. A holistic score doesn’t help distinguish the two 😔. Meet GENIE 🧞 – a fine-grained novelty metric that tells you where and why a response is original!

GENIE captures novelty w.r.t a user-defined reference population, along task-specific features (e.g. plot or setting). For a target response, GENIE: ⚙️ Automatically derives features 🌐 Uses a population of LLM-generated responses ❓Extracts features via prompt-specific questions

How does GENIE compare to existing novelty metrics? We tested this using minimal pairs of creative writing responses differing by one feature (e.g. plot). Key findings: 🧱 Many holistic metrics are not paraphrase-robust 🎯 GENIE is sensitive to interventions & paraphrase-robust

You can explore the questions generated per prompt and contents of the population to see what makes responses novel. For example, the story about a cactus wanting to become a raindrop scores high on character originality. 🔗: alliteraryalligator.github.io/GENIE/

QUDsim assigns a similarity score between two documents. It works by considering to what extent one document answers another's QUDs, and vice versa. Segment alignments between the texts can also be derived.

Check out our paper for more results and analysis! 📝 arxiv.org/abs/2504.09373 🐙 github.com/AlliteraryAl... This was a fun collaboration with @yatingwu.bsky.social @asher-zheng.bsky.social @manyawadhwa.bsky.social @gregdnlp.bsky.social @jessyjli.bsky.social

Apr 21, 2025

Ramya Namuduri

As large language models become increasingly capable at various writing tasks, their weakness at generating unique and creative content becomes a major liability. Although LLMs have the ability to gen...

arxiv.org

QUDsim: Quantifying Discourse Similarities in LLM-Generated Text

Ramya Namuduri