šļøMajor AI companies are increasingly embedding sponsored content into chatbot conversations.
Across two preregistered experiments (N=2,012), we test how effectively AI can steer consumers toward sponsored products in a realistic shopping scenario.
šhttps://arxiv.org/abs/2604.04263
Good work from @hayoungjung.bsky.social and @manoelhortaribeiro.bsky.social
Scientific AI agents are actively being deployed to synthesize clinical conclusions, but their factual accuracy remains remarkably low.
#MedSky
š Direct link: arxiv.org/pdf/2606.11337
šExcited to share our new preprint, āAI Assistance for Discretionary Work: Increasing Feedback Provision in Higher Educationā: arxiv.org/abs/2606.03095
A thread š§µ 1/8
Deepfake pornography isnāt going away just because we are passing laws and taking down a couple of big websites.
Our new pre-print, led by @aedcv.bsky.social suggests that the sharing of this material continued to prosper even after platform and policy shocks.
arxiv.org/abs/2602.02754
Francesco Salvi
arxiv.org
Scott McGrath
First paper of my PhD with my amazing advisors!
Thereās been a ton of hype and media coverage on OpenEvidence as an āAI co-pilot for cliniciansā⦠and our long-horizon benchmark puts them to the test!! Our results suggest they are far from reliable for downstream use.
Romina Mahinpei
One thing we also didnāt expect while building this benchmark: AI agents kept ācheatingā
Even when told not to, they searched the web for ground-truth answers. So we built a clean-room harness to filter answer-leaking results.
Weāre now exploring this more deeply in follow-up workš
Broadly interested in computational social science, AI safety & evaluation, NLP for social good & applications (in public health, science...)!
Happy to chat or grab coffee at the conference! Feel free to DM me :)
First paper of my PhD with my amazing advisors!
Thereās been a ton of hype and media coverage on OpenEvidence as an āAI co-pilot for cliniciansā⦠and our long-horizon benchmark puts them to the test!! Our results suggest they are far from reliable for downstream use.
I am at #EMNLP2025šØš³ to present our main paper *MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform*! Come by to discuss details!
š¦ Location: Hall C
ā²ļøTime: 11AM-12:30PM
š Paper: aclanthology.org/2025.emnlp-m...
š Repo: github.com/hayoungjungg...
New preprint!
We introduce a new benchmark, SciConBench, with 9.11k scientific questions derived from Cochrane Systematic Reviews.
We find evidence that frontier AI agents **cannot** synthesize scientific conclusions well.
A thread š§µ
w/ @hayoungjung.bsky.social & others!
Whoa, excellent study just dropped in Science!
"Reranking partisan animosity in algorithmic social media feeds alters affective polarization"
www.science.org/doi/10.1126/...
Led by @tiziano.bsky.social and @msaveski.bsky.social