Inlay

Profile

PhD student in Social Data Science @ University of Mannheim | jasoju.github.io

Jana Jung

Are you using survey-style questionnaires designed for humans to measure characteristics of LLMs? In our #EACL2026 paper, we evaluate both the reliability and validity of such tests and found that their scores do not reflect real-world model behavior. In fact, they can be deceptive! 🧵1/3

3mo

LLMs can generate synthetic survey responses, e.g. for imputation, but how reliable are they? 📋 At #IC2S2, I'll be sharing our research on the robustness of AI-generated responses to perturbations and if they mirror human survey biases. 🤖 Come by my poster on Tuesday between 1:30 and 3:30 p.m.

For all 3 constructs we looked at –sexism, racism, and morality– the correlations between tests scores and behavior in a related downstream task are only weak positive, or even negative. 📢 Our results call for LLM-specific evaluations instead of applying tests originally developed for humans. 2/3

11mo

Jana Jung

📄 Paper: arxiv.org/abs/2510.11254 A very big thank you to my amazing collaborators @marlutz.bsky.social, @indiiigo.bsky.social, and @mstrohm.bsky.social! 3/3

3mo

👋 #ACL2025NLP 🇦🇹 @marlutz.bsky.social and I are presenting our poster on demographic representativeness of LLMs today! 🕦 10:30-12:00 📍 Hall X5 (board 1 or 14 according to different sources 🧐) Here’s the paper on ACL anthology: aclanthology.org/2025.finding... Drop by!

Jana Jung

🚨New paper alert🚨 🤔 Ever wondered how the way you write a persona prompt affects how well an LLM simulates people? In our #EMNLP2025 paper, we find that using interview-style persona prompts makes LLM social simulations less biased and more aligned with human opinions. 🧵1/7

Jana Jung

10mo

Jens Rupprecht

7mo

Marlene Lutz

Poster Session 1 is live in the Atrium! Explore the work and cast your daily vote—just scan the QR code to submit your favorite poster ID. Everyone has one vote per day. #ic2s2

Chair for Data Science in the Economic and Social Sciences at University of Mannheim having lots of fun at #ic2s2 @janajung.bsky.social @wanlo.bsky.social @indiiigo.bsky.social @jrupprec.bsky.social @maximiliankreutner.bsky.social and Stefano Balietti

11mo

Indira Sen

International Conference on Computational Social Science

Markus Strohmaier

aclanthology.org

Indira Sen, Marlene Lutz, Elisa Rogers, David Garcia, Markus Strohmaier. Findings of the Association for Computational Linguistics: ACL 2025. 2025.

Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs

Psychometric tests are increasingly used to assess psychological constructs in large language models (LLMs). However, it remains unclear whether these tests -- originally developed for humans -- yield...

arxiv.org

Do Psychometric Tests Work for Large Language Models? Evaluation of Tests on Sexism, Racism, and Morality

Thrilled to talk about how seemingly small decisions in silicon sampling can have a large impact on simulated survey responses 👀 Join us on Oct 29th! 👈

Really excited to also present this work at #IC2S2 next week in Norrköping! 🎉 I'd love to discuss how to produce LLM survey responses at my poster on Wed at 13:30 (Poster Session 2, Poster ID 68) 📊

8mo

11mo

Do LLMs represent the people they're supposed simulate or provide personalized assistance to? We review the current literature in our #ACL2025 Findings paper and investigating what researchers conclude about the demographic representativeness of LLMs: osf.io/preprints/so... 1/

11mo

Indira Sen

Georg Ahnert

🚨 Upcoming #CS3Meeting 🚨 @wanlo.bsky.social talks about analytic flexibility in silicon samples on October 29, 3:15 to 4:00 PM CET). Great opportunity to gain novel insights into how survey responses can be generated with #LLMs. Sign up now: ww3.unipark.de/uc/cs3_meeti...

8mo

LLMs are trained to produce open-ended responses 📝, but most survey items require closed-ended responses instead 📊 This Wed 11:00–12:30 at #ESRA25, I'll discuss the large impact that Answer Production Methods have on prediction results + share recommendations for methods and parameters. 👈

11mo

Georg Ahnert

Joshua Claassen