1/8
Do LLMs have stable personalities? We ran 2 million tests. (Spoiler: no.) đź§µ
Paper accepted at AAAI 2026 - Alignment Track
Safe deployment requires behavioral consistency. We found persistent instability across scales, reasoning modes, and personas. ⤵️
2/8
We tested 25 open-source models (1B-685B params) across 2M+ responses to personality questionnaires (BFI, Short Dark Triad), systematically varying question order, paraphrasing, personas, and reasoning modes. ⤵️
4/8
Finding 2: Chain-of-thought reasoning INCREASES variability while DECREASING perplexity
Models become more confident yet less consistent. Explanation paradoxically undermines reliability.
7/8
What this means: Current LLMs may lack architectural foundations for genuine behavioral consistency.
Training on diverse text creates models simulating myriad personalities in superposition. Post-training may be a brittle stabilization attempt.