Inlay

ProfilePosts

💼 In business scenarios (selling defective products), models were either completely honest OR completely deceptive 🌐 In public image scenarios (reputation management), behaviors were more ambiguous and complex 4/

🔄 Multi-turn interactive setup is crucial - models often begin with equivocation but shift to falsification when pressed for clear answers 🧠 Stronger models like GPT-4o showed the greatest shift when prompted to deceive (40% increase in falsification; alarming) 6/

Check out our paper to learn more about how LLMs navigate these ethical dilemmas: arxiv.org/abs/2409.09013 . 7/ #AI #MachineLearning #AIEthics #LLMs #nlp #NLProc #NAACL2025

LLM agent simulations for policy: A field full of potential, yet clouded by myths and big questions. 🏛️🤖 We’re opening a new venue to spark open discussion and drive this research forward. Join the conversation! 🧵

Obviously this is a pressing issue now: x.com/deedydas/sta...; x.com/DanHendrycks... And here, we put LLMs into a multi-turn dialogue environment mimic the realistic setting where users constantly try to seek info from LLMs 2/

Apr 28, 2025

5mo

To be safely and successfully deployed, LLMs must simultaneously satisfy truthfulness and utility goals. Yet, often these two goals compete (e.g., an AI agent assisting a used car salesman selling a c...

arxiv.org

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

Xuhui Zhou

🚨 New CHI 2026 Workshop 🚨 PoliSim@CHI 2026: LLM Agent Simulation for Policy

5mo

Yuxuan Li

When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/

Apr 28, 2025

Wonderful collaborations with Zhe Su, Anubha Kabra, Sanketh Rangreji, @jmendelsohn2.bsky.social , @faeze_brh , @maartensap.bsky.social