learning @IFM_MBZUAI, Silicon Valley Lab // π€Ph.D. @UTAustin
Hongli Zhan
Loading...
What does a scientific figure make you wonder? π
We introduce MQUD: multimodal Questions Under Discussion for scientific figures.
With 1,250 author-annotated questions over 245 figures from 56 papers, MQUD asks what scientific question a figure raises in context.
In multi-turn conversation, LLMs tend to repeat the same kind of things over and over again. They could have different words, but we found them to be the *same discourse moves*!
Introducing @hongli-zhan.bsky.socialβs new work: novel discourse-level diversity rewards in post-training:
[7/7] This is the last paper of my PhD at UT Austin, wrapping up 5 years of work on emotionally intelligent AI.
Huge thanks to my advisor @jessyjli.bsky.social, and co-authors Emma Gueorguieva, Javier Hernandez, Jina Suh, and @desmond-ong.bsky.social!
Yating Wu
[2/7] Ask ChatGPT to comfort someone 10 times. You'll notice it always does the same moves: reflect, validate, suggest.
Human counselors don't do this. They adapt -- sometimes they challenge, sometimes they stay quiet, sometimes they share a story.
[6/7] Models:
huggingface.co/hongli-zhan/...
huggingface.co/hongli-zhan/...
Code and data:
github.com/honglizhan/m...
[3/7] We call this "tactic stickiness" -- when a model locks onto the same empathic moves turn after turn.
We formalize it and find: LLMs reuse the same tactic sequences FAR more than human supporters. And standard metrics (BLEU, BERTScore) completely miss it.
[4/7] Our fix: MINT (Multi-turn Inter-tactic Novelty Training).
We use GRPO to reward models for diversifying their support tactics across turns, without sacrificing empathy quality.
New paper! π Last one from my PhD at UT Austin.
LLMs sound empathic but repeat the same discourse moves turn after turn β at 2x the rate of humans.
We built MINTπΏ, the first RL framework for discourse move diversity in empathic dialogue. +25% empathy, β26% repetition.
π arxiv.org/abs/2604.11742
[5/7] Results: MINT improves empathy by 25% while reducing tactic repetition by 26%.
A 4B model trained with MINT surpasses all baselines, including quality-only RL and token-level diversity methods. You need discourse-level signals, not just token-level diversity.
1k+ downloads each on the MINT empathy models since release π₯ Encouraging to see the interest in our work!
tl;dr: In multi-turn empathic dialogue, LLMs reuse the same discourse moves far more often than humans do; MINT uses RL to diversify them. Give it a try!π
huggingface.co/hongli-zhan/...