PhD student in Social Data Science @ University of Mannheim | jasoju.github.io
Jana Jung
Loading...
Are you using survey-style questionnaires designed for humans to measure characteristics of LLMs?
In our #EACL2026 paper, we evaluate both the reliability and validity of such tests and found that their scores do not reflect real-world model behavior. In fact, they can be deceptive!
🧵1/3
LLMs can generate synthetic survey responses, e.g. for imputation, but how reliable are they? 📋
At #IC2S2, I'll be sharing our research on the robustness of AI-generated responses to perturbations and if they mirror human survey biases. 🤖
Come by my poster on Tuesday between 1:30 and 3:30 p.m.
For all 3 constructs we looked at –sexism, racism, and morality– the correlations between tests scores and behavior in a related downstream task are only weak positive, or even negative.
📢 Our results call for LLM-specific evaluations instead of applying tests originally developed for humans.
2/3
Jana Jung
📄 Paper: arxiv.org/abs/2510.11254
A very big thank you to my amazing collaborators @marlutz.bsky.social, @indiiigo.bsky.social, and @mstrohm.bsky.social!
3/3
👋 #ACL2025NLP 🇦🇹 @marlutz.bsky.social and I are presenting our poster on demographic representativeness of LLMs today!
🕦 10:30-12:00
📍 Hall X5 (board 1 or 14 according to different sources 🧐)
Here’s the paper on ACL anthology: aclanthology.org/2025.finding...
Drop by!
Jana Jung
🚨New paper alert🚨
🤔 Ever wondered how the way you write a persona prompt affects how well an LLM simulates people?
In our #EMNLP2025 paper, we find that using interview-style persona prompts makes LLM social simulations less biased and more aligned with human opinions.
🧵1/7
Jana Jung
Jens Rupprecht
Marlene Lutz
Poster Session 1 is live in the Atrium! Explore the work and cast your daily vote—just scan the QR code to submit your favorite poster ID. Everyone has one vote per day. #ic2s2
Chair for Data Science in the Economic and Social Sciences at University of Mannheim having lots of fun at #ic2s2 @janajung.bsky.social @wanlo.bsky.social @indiiigo.bsky.social @jrupprec.bsky.social @maximiliankreutner.bsky.social and Stefano Balietti
Indira Sen
International Conference on Computational Social Science
Markus Strohmaier
aclanthology.org
Indira Sen, Marlene Lutz, Elisa Rogers, David Garcia, Markus Strohmaier. Findings of the Association for Computational Linguistics: ACL 2025. 2025.
Psychometric tests are increasingly used to assess psychological constructs in large language models (LLMs). However, it remains unclear whether these tests -- originally developed for humans -- yield...
Thrilled to talk about how seemingly small decisions in silicon sampling can have a large impact on simulated survey responses 👀 Join us on Oct 29th! 👈
Really excited to also present this work at #IC2S2 next week in Norrköping! 🎉 I'd love to discuss how to produce LLM survey responses at my poster on Wed at 13:30 (Poster Session 2, Poster ID 68) 📊
Do LLMs represent the people they're supposed simulate or provide personalized assistance to?
We review the current literature in our #ACL2025 Findings paper and investigating what researchers conclude about the demographic representativeness of LLMs:
osf.io/preprints/so...
1/
Indira Sen
Georg Ahnert
Georg Ahnert
🚨 Upcoming #CS3Meeting 🚨
@wanlo.bsky.social talks about analytic flexibility in silicon samples on October 29, 3:15 to 4:00 PM CET).
Great opportunity to gain novel insights into how survey responses can be generated with #LLMs.
Sign up now: ww3.unipark.de/uc/cs3_meeti...
LLMs are trained to produce open-ended responses 📝, but most survey items require closed-ended responses instead 📊
This Wed 11:00–12:30 at #ESRA25, I'll discuss the large impact that Answer Production Methods have on prediction results + share recommendations for methods and parameters. 👈