See arxiv.org/abs/2602.04081 for the preprint
Extended now across speech-audio models, more fMRI subjects, and ECoG.
Both dimensionality, i.e., ability to represent complex features, and brain predictivity grow with linguistic capabilities over LLM training.
Dimensionality (proxying linguistic abstraction) explains away surprisal’s effect on brain-likeness. This suggests that representing complex linguistic features drives brain-model similarity. Next-token prediction is just one task among possibly many that elicits this ability.
Do you use a pronoun more often when the entity you’re talking about is more predictable?
Previous work offers diverging answers so we conducted a meta-analysis, combining data from 20 studies across 8 different languages.
Now out in Language: muse.jhu.edu/article/969615
Does dimensionality *cause* brain predictivity?
❌High-dimensional random features don't predict the brain!
➡️Learning good linguistic abstractions results in feature spaces that are higher-dimensional and more brain-like. Dimensionality per-se is not a causal driver.
The layerwise correlation between dimensionality and brain predictivity was *highest* for voxels and electrodes in conventional fronto-temporal language areas.
...but how should we interpret dimensionality?...
Explicitly increasing brain predictivity by finetuning layers on fMRI responses also *increased* both dimensionality and semantic content.
So far dimensionality, linguistic abstraction, and brain predictivity seem related. But does dimensionality *cause* brain predictivity?
Presenting this at #ICML with @rjantonello.bsky.social and Aditya Vaidya✨
Why do 𝙢𝙞𝙙𝙙𝙡𝙚 layers in LLMs and speech-audio models best predict brain responses to language?
We show a peak in the dimensionality of 🤖 activations (left) to track high 🧠 predictivity (right)
🧵(cross-posted from X)
Good news everyone!
I’ll be presenting the paper I did with Marco and Iuri "Prediction Hubs are Context-Informed Frequent tokens in LLMs" at the ELLIS UnConference on December 2nd in Copenhagen. arxiv.org/abs/2502.10201
High dimensionality signifies that your model has learned a nice, complex feature space of language.
The dimensionality peak in LLMs (and to a weaker extent, speech-audio models) marks a phase of higher-order linguistic abstraction, which we showed with probing.