There’s plenty of evidence for political bias in LLMs, but very few evals reflect realistic LLM use cases — which is where bias actually matters.
IssueBench, our attempt to fix this, is accepted at TACL, and I will be at #EMNLP2025 next week to talk about it!
New results 🧵
Paul Röttger
Are LLMs biased when they write about political issues?
We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before.
Long 🧵with spicy results 👇
Paul Röttger
Olmo 3 is notable as a "fully open" LLM - all of the training data is published, plus complete details on how the training process was run. I tried out the 32B thinking model and the 7B instruct models, + thoughts on why transparent training data is so important simonwillison.net/2025/Nov/22/...
We're at #NeurIPS2025 with papers, posters, workshops, fireside chats, & talks across the conference. Come learn about our latest research + see live demos!
Olmo is the LLM series from Ai2—the Allen institute for AI. Unlike most open weight models these are notable for including the full training data, training process and checkpoints along …
simonwillison.net
🗓️ SwissText 2026 keynote speakers announced & registration open!
We are delighted to welcome Prof. Dr. Alexandra Birch and Dr. Valentina Pyatkin as our keynote speakers.
📋 Register here: ema.uzh.ch/RHK4W
Early-bird rates available throughout April, with additional student discounts.
#NLProc
1/3
Front Conference Zurich is coming up soon! On Friday, February 27, an amazing group of speakers will explore how AI is reshaping the way we work, from creativity and product design to engineering and collaboration
🤩 Our lineup: frontconference.com/schedule
🎟️ Your ticket: frontconference.com/tickets
Olmo 3 is out! 🤩
I am particularly excited about Olmo 3 models' precise instruction following abilities and their good generalization performance on IFBench!
Lucky to have been a part of the Olmo journey for three iterations already.
🚨 New Study 🚨
@arxiv.bsky.social has recently decided to prohibit any 'position' paper from being submitted to its CS servers.
Why? Because of the "AI slop", and allegedly higher ratios of LLM-generated content in review papers, compared to non-review papers.
I will be giving a talk at @eth-ai-center.bsky.social next week, on RLVR for verifiable instruction following, generalization, and reasoning! 📢
Join if you are in Zurich and interested in hearing about IFBench and our latest Olmo and Tülu works at @ai2.bsky.social