//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...
UC-Berkeley Postdoc🐻, Scientific Consultant AnthropicAI🏔️, Evomics Workshop Codirector🧬 prev Vanderbilt PhD, FutureHouse/Edison, Latch, Mantle (acquired) 🌲 https://linktr.ee/jlsteenwyk 📍 https://jlsteenwyk.com
🧬Jacob L Steenwyk






Loading...
Manuscript forthcoming. Just wanted to share these results ahead of the article. Really grateful for the inspiration from folks at teams like GoodFireAI and AnthropicAI, among others Really grateful for the open-source models from AllenAI, Meta, Mistral, Alibaba-Qwen & Google
3d
Probes are at deep layers (~70% of model depth), where epistemic axes are most separable. Visualization is t-SNE 3D. Local neighborhood structure is preserved, so cluster identity is meaningful (treat global distances with caution).
How I found them: for each state, pairs of prompts were written; one designed to elicit it (e.g., asking about a fabricated mechanism, which corresponds to confabulating) and one neutral baseline The mean activation difference gives a direction for that state.
These directions are causally relevant! Adding the "confabulating" direction during inference increases confab rates. Subtracting it from wrong-answer activations rescues the correct answer in up to 32% of cases (OLMo 3).
I initially started with 15 candidate states across 4 categories: self-knowledge, world-knowledge, reasoning mode, and epistemic stance. 9 survive a strict bar: k-NN purity ≥ 0.90 in every model. The other 6 collapse with neighbors in at least one (e.g. "certain" ≈ "recalling")
It is interesting that 5 different model architectures that implemented different training pipelines and come from different teams converge on roughly the same epistemic geometry This, therefore, may be a general property of how next-token prediction organizes "knowing."
NEW results on the #geometry of "knowing" in #LLMs Where does an LLM's "I'm just guessing" live? Its "I'm fabricating"? Its "I'm deriving step by step"? The answer is in distinct areas of #activation space -- and this shared geometry of #epistemic states is observed across diverse #OS models.
3d
3d