PhD candidate in CS at Northeastern University | NLP + HCI for health | she/her 🏃♀️🧅🌈
Hye Sun Yun
Loading...
This is a great follow-up to our recent preprint! This small-scale evaluation introduces a framing-resistant prompt and makes a step toward exploring the mitigation space for the framing sensitivity problem.
This framing effect is further amplified in multi-turn conversations, where sustained persuasion increases inconsistency. [3/6]
Our conclusion: LLM medical responses vary based on question phrasing alone, despite identical underlying evidence. For patients and consumers, how you ask may determine what you're told. [5/6]
We also compared using technical terms vs plain language terms in our questions. However, we didn’t find any meaningful differences in this language style. [4/6]
Patients ask LLMs medical questions — but how they phrase it matters more than it should.
Our new preprint explores how different phrasings of patient health questions can lead to inconsistent conclusions, even with the same evidence. [1/6]
Full Paper: arxiv.org/abs/2604.05051
"Does this work?" vs "Does this not work?” Are conclusions different even though the LLM was given the same evidence documents?
Yes. Positive vs negative framing leads to more contradictory conclusions than responses from the positive question sampled twice. [2/6]
I would like to thank my amazing co-authors!
Geetika Kapoor, @mackert.bsky.social, @ramezkouzy.bsky.social, @cocoweixu.bsky.social, @jessyjli.bsky.social, and @byron.bsky.social. [6/6]
Please check out our full findings here: arxiv.org/abs/2604.05051
Whether for working co-ops far and wide or strengthening community close to home, Khoury College's students have made their mark. Per annual tradition, ten were recognized this spring for their achievements.
Read more: https://bit.ly/4uP2MtI
Thrilled to share our research showing how LLM models can be influenced by bias from "spun" medical literature is now featured in Northeastern's Khoury news! This shows critical insights as AI enters healthcare.
The full paper can be found at arxiv.org/abs/2502.07963