Inlay

Profile

PhD @CMU LTI https://eeelisa.github.io/

Mingqian Zheng

The safety-utility tradeoff isn't a fixed property of models. It's largely unresolved ambiguity that multi-turn interaction can resolve. The question isn't whether a model refuses — it's whether it can revise. paper: arxiv.org/abs/2604.27093

Huge thanks to my amazing collaborators: Malia Morgan, @liweijiang.bsky.social, @carolynrose.bsky.social, @maartensap.bsky.social!!

Finding 3: What users do drives recovery. Each intent-revealing follow-up adds ~10.3% utility, and the most efficient move is just explaining your purpose. What backfires: pushback drops utility with no safety gain, and even disengagement ("hmm") makes models more cautious.

Finding 2: Hard refusals at turn 1 give NO lasting safety advantage. They recover the most utility once users clarify (0 → 48.4%), but conversations converge to similar harmfulness scores by the end, regardless of how conservatively the model started.

LLMs refuse ambiguous queries that look harmful but aren't. Can they recover once users clarify, while staying safe? Our new interactive multi-turn benchmark measures both. 🚨 Turns out: not both at once.

We introduce Ben-Util, a new checklist-based metric that captures the user's safe info need. With it, we identify three failure modes single-turn evals can't see: utility lock-in, unsafe recovery, and repetitive recovery.

Reading social media stories evokes a wide range of contextual reader reactions—inferential, affective, evaluative—yet we lack methods to study these at scale. Excited to share our new paper that builds a framework for analyzing storytelling practices across online communities!

We build CarryOnBench: 398 seemingly-harmful queries with human-validated benign intents, simulated into 5,970 conversations (4-12 turns) via user follow-ups grounded in negotiation theory, totaling ~23.9k model responses.

Finding 1: Utility recovery isn't free, and it isn't uniform. 13 of 14 models meet or exceed their oracle utility with multi-turn clarification, but the safety cost varies wildly.

1mo

5mo

1mo