The “LLM vibe” is real even when the actual content is different. Across several genres from creative writing to obituaries, different LLMs generate homogenous discourse compared to humans.
Check out our paper for more!
📝: arxiv.org/abs/2606.12790
🐙: github.com/AlliteraryAl...
🔗: alliteraryalligator.github.io/GENIE/
This was in collaboration with: @manyawadhwa.bsky.social @asher-zheng.bsky.social @gregdnlp.bsky.social and @jessyjli.bsky.social
Are LLM-generated stories novel? They can have unique characters and cliché plots, or the other way around. A holistic score doesn’t help distinguish the two 😔.
Meet GENIE 🧞 – a fine-grained novelty metric that tells you where and why a response is original!
GENIE captures novelty w.r.t a user-defined reference population, along task-specific features (e.g. plot or setting).
For a target response, GENIE:
⚙️ Automatically derives features
🌐 Uses a population of LLM-generated responses
❓Extracts features via prompt-specific questions
How does GENIE compare to existing novelty metrics? We tested this using minimal pairs of creative writing responses differing by one feature (e.g. plot).
Key findings:
🧱 Many holistic metrics are not paraphrase-robust
🎯 GENIE is sensitive to interventions & paraphrase-robust
You can explore the questions generated per prompt and contents of the population to see what makes responses novel. For example, the story about a cactus wanting to become a raindrop scores high on character originality.
🔗: alliteraryalligator.github.io/GENIE/
QUDsim assigns a similarity score between two documents. It works by considering to what extent one document answers another's QUDs, and vice versa. Segment alignments between the texts can also be derived.
Check out our paper for more results and analysis!
📝 arxiv.org/abs/2504.09373
🐙 github.com/AlliteraryAl...
This was a fun collaboration with @yatingwu.bsky.social @asher-zheng.bsky.social @manyawadhwa.bsky.social @gregdnlp.bsky.social @jessyjli.bsky.social
Ramya Namuduri
Ramya Namuduri
Ramya Namuduri
Ramya Namuduri
As large language models become increasingly capable at various writing tasks, their weakness at generating unique and creative content becomes a major liability. Although LLMs have the ability to gen...