Inlay

How does GENIE compare to existing novelty metrics? We tested this using minimal pairs of creative writing responses differing by one feature (e.g. plot). Key findings: 🧱 Many holistic metrics are not paraphrase-robust 🎯 GENIE is sensitive to interventions & paraphrase-robust