It is interesting that 5 different model architectures that implemented different training pipelines and come from different teams converge on roughly the same epistemic geometry
This, therefore, may be a general property of how next-token prediction organizes "knowing."
I initially started with 15 candidate states across 4 categories: self-knowledge, world-knowledge, reasoning mode, and epistemic stance.
9 survive a strict bar: k-NN purity ≥ 0.90 in every model. The other 6 collapse with neighbors in at least one (e.g. "certain" ≈ "recalling")