Would you realize if the book you were reading was AI? What if it was humanized to remove AI-speak?
We find that even without using stylistic cues (e.g., word choice or sentence structure) narrative choices alone give AI fiction away!
work done with amazing collaborators: Tiasa, @harveylederman.bsky.social , @jessyjli.bsky.social and @gregdnlp.bsky.social 🙌
Hello world 👋
My first paper at UT Austin!
We ask: what happens when medical “evidence” fed into an LLM is wrong? Should your AI stay faithful, or should it play it safe when the evidence is harmful?
We show that frontier LLMs accept counterfactual medical evidence at face value.🧵
Unlike creativity tests for humans (e.g., Alternate Uses Test), CREATE:
✅ Requires reasoning over parametric knowledge
✅ Is objectively verifiable
This makes it closer to real creativity tasks like brainstorming or hypothesis generation, while supporting quantitative eval!
CREATE provides a concrete testbed for evaluating associative creative in LLMs and highlights the need for further work to fully leverage LLMs for such tasks!
CREATE evaluates whether models can construct interesting and distinct paths to connect concepts in their parametric knowledge. This mirrors associative reasoning in writing & scientific ideation: paths must be coherent, factually grounded, and conceptually diverse!