It is not easy to characterize the features represented by human language cortex. This work is a step toward doing so.
Using small, interpretable feature sets, we explain language-network responses and show a shared feature basis across brain regions, with graded variation across individuals.
Greta Tuckute
🚨New preprint!🚨
We know that LM representations can be used to predict brain responses to language. But what *features* of these representations underlie this alignment? We use SAEs to find out!