Dimensionality (proxying linguistic abstraction) explains away surprisal’s effect on brain-likeness. This suggests that representing complex linguistic features drives brain-model similarity. Next-token prediction is just one task among possibly many that elicits this ability.
See arxiv.org/abs/2602.04081 for the preprint
Extended now across speech-audio models, more fMRI subjects, and ECoG.
High dimensionality signifies that your model has learned a nice, complex feature space of language.
The dimensionality peak in LLMs (and to a weaker extent, speech-audio models) marks a phase of higher-order linguistic abstraction, which we showed with probing.
Good news everyone!
I’ll be presenting the paper I did with Marco and Iuri "Prediction Hubs are Context-Informed Frequent tokens in LLMs" at the ELLIS UnConference on December 2nd in Copenhagen. arxiv.org/abs/2502.10201
Do you use a pronoun more often when the entity you’re talking about is more predictable?
Previous work offers diverging answers so we conducted a meta-analysis, combining data from 20 studies across 8 different languages.
Now out in Language: muse.jhu.edu/article/969615
Does dimensionality *cause* brain predictivity?
❌High-dimensional random features don't predict the brain!
➡️Learning good linguistic abstractions results in feature spaces that are higher-dimensional and more brain-like. Dimensionality per-se is not a causal driver.
Both dimensionality, i.e., ability to represent complex features, and brain predictivity grow with linguistic capabilities over LLM training.
Explicitly increasing brain predictivity by finetuning layers on fMRI responses also *increased* both dimensionality and semantic content.
So far dimensionality, linguistic abstraction, and brain predictivity seem related. But does dimensionality *cause* brain predictivity?