//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...
AI safety researcher
Tommie Tosato







Loading...
7/8 What this means: Current LLMs may lack architectural foundations for genuine behavioral consistency. Training on diverse text creates models simulating myriad personalities in superposition. Post-training may be a brittle stabilization attempt.
4/8 Finding 2: Chain-of-thought reasoning INCREASES variability while DECREASING perplexity Models become more confident yet less consistent. Explanation paradoxically undermines reliability.
6/8 Finding 4: Misaligned personas show increased variability Antisocial/schizophrenia persona prompts increase inconsistency vs baseline. Behavioral inconsistency itself may serve as a misalignment signal.
5/8 Finding 3: Conversation history cuts both ways Amplifies instability in smaller models (<50B) but reduces it in larger ones. Multi-turn interactions can progressively degrade behavioral predictability.
8/8 📄 Paper: arxiv.org/abs/2508.04826 💻 Code: github.com/tosatot/PERSIST Thanks to co-authors: @saskiahelbling.bsky.social, @yjmantilla.bsky.social , Mahmood Hegazy, Alberto Tosato, David John Lemay, Irina Rish, @introspection.bsky.social
5mo
5mo
5mo
5mo
5mo
2/8 We tested 25 open-source models (1B-685B params) across 2M+ responses to personality questionnaires (BFI, Short Dark Triad), systematically varying question order, paraphrasing, personas, and reasoning modes. ⤵️
1/8 Do LLMs have stable personalities? We ran 2 million tests. (Spoiler: no.) 🧵 Paper accepted at AAAI 2026 - Alignment Track Safe deployment requires behavioral consistency. We found persistent instability across scales, reasoning modes, and personas. ⤵️
3/8 Finding 1: Scaling provides limited stability gains Even 400B+ parameter models exhibit significant instability (SD > 0.3 on 5-point scales). Bigger ≠ stable.
5mo
Tommie Tosato
Tommie Tosato
Tommie Tosato
Tommie Tosato
Tommie Tosato
5mo
5mo
Tommie Tosato
Tommie Tosato
Tommie Tosato