[3/4] Do VLMs actually ground in the figure?
Fine-tuning Qwen3.5-9B on MQUD makes generated questions more grounded in the figure and more specific to the paperβs scientific content.
What does a scientific figure make you wonder? π
We introduce MQUD: multimodal Questions Under Discussion for scientific figures.
With 1,250 author-annotated questions over 245 figures from 56 papers, MQUD asks what scientific question a figure raises in context.
[2/4] These questions often require reasoning across the figure and paper text:
Why does this curve shift?
What comparison is scientifically meaningful?
What claim is this figure supporting? π
Check out our paper for more results and analysis!
π arxiv.org/abs/2504.09373
π github.com/AlliteraryAl...
This was a fun collaboration with @yatingwu.bsky.social @asher-zheng.bsky.social @manyawadhwa.bsky.social @gregdnlp.bsky.social @jessyjli.bsky.social
Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short π§΅ about our new paper, led by Jan Trienes: an interpretable framework for salience analysis in LLMs.
First of all, information salience is a fuzzy concept. So how can we even measure it? (1/6)
We at UT Linguistics are hiring for π₯ 2 faculty positions in Computational Linguistics! Assistant or Associate professors, deadline Dec 1.
UT has a super vibrant comp ling & #nlp community!!
Apply here π apply.interfolio.com/158280
β¨New paperβ¨
Linguistic evaluations of LLMs often implicitly assume that language is generated by symbolic rules.
In a new position paper, @adelegoldberg.bsky.social, @kmahowald.bsky.social and I argue that languages are not Lego sets, and evaluations should reflect this!
arxiv.org/pdf/2502.13195
I did a starter pack of ML/AI people at @utaustin.bsky.social Please distribute and feel free to self nominate!
go.bsky.app/QLQznZg
1k+ downloads each on the MINT empathy models since release π₯ Encouraging to see the interest in our work!
tl;dr: In multi-turn empathic dialogue, LLMs reuse the same discourse moves far more often than humans do; MINT uses RL to diversify them. Give it a try!π
huggingface.co/hongli-zhan/...
Yating Wu
Yating Wu
Yating Wu
Yating Wu
Ramya Namuduri
Jessy Li
Jessy Li
Leonie Weissweiler
Atlas Wang
Hongli Zhan
New paper! π Last one from my PhD at UT Austin.
LLMs sound empathic but repeat the same discourse moves turn after turn β at 2x the rate of humans.
We built MINTπΏ, the first RL framework for discourse move diversity in empathic dialogue. +25% empathy, β26% repetition.
π arxiv.org/abs/2604.11742