Inlay

Profile

PhD at EPFL 🧠💻 Ex @MetaAI, @SonyAI, @Microsoft Egyptian 🇪🇬

Badr AlKhamissi

So why build toward a brain-encoding foundation model? → Simulate fMRI responses to sensory stimuli → Insights about the brain → Path toward clinical applications MIRAGE is our first step. 🧠 More on the project site: mirage-brain.epfl.ch w/ @akgokce.bsky.social & @mschrimpf.bsky.social

Two simple ideas for building improved brain encoding models: 1. learn to use representations from all model layers via a gating mechanism + 2. start from natively multimodal features for multimodal predictions. State of the art performance; see mirage-brain.epfl.ch for details #NeuroAI 🧠🤖🧪

🧠 When you watch a movie, your brain blends sight, sound, and speech into a single experience. Should models of the brain blend them too, or keep the senses separate until the very end? We built MIRAGE to find out. It sets a new SOTA for predicting whole-brain fMRI from movies. 🧵

Most brain-encoding pipelines pull vision, audio, and language features from separate models, then fuse them late, at the readout. But modern foundation models fuse modalities during pretraining. Which kind of fusion is actually more brain-relevant?

Key finding: native fusion beats post-hoc fusion at every architectural level: linear ridge, brain encoder, and full MIRAGE. The kicker: on out-of-distribution movies, a single MIRAGE model beats TRIBE v1's 1,000-model ensemble!! Giving a new SOTA on Algonauts 2025 OOD 🏆

And this isn't a quirk of one model. Across 2 backbone families and 3 scales, native fusion wins at every single scale. Fusing modalities during pretraining yields features that are more brain-aligned than stitching unimodal streams together afterward.

Enter MIRAGE 🪄 Most encoding models pin a linear readout to one fixed layer. MIRAGE does neither. A frozen omni-modal backbone (Qwen3-Omni) exposes all 48 layers → per-modality cross-attention gates pool them adaptively → a transformer maps to cortex non-linearly, with a per-subject head.

Each modality also traces a distinct anatomical pattern: vision → occipitotemporal, audio → auditory cortex, text → the language network. MIRAGE's largest gains over our linear baseline land in visual & dorsal-attention areas, exactly where rich social-movie content demands integration.

Bonus: MIRAGE is inspectable. The gates' attention weights reveal which backbone layers each modality reads from. 👀 Vision is sharply tuned to mid-depth layers (~25–30), text spreads across mid-to-late layers, audio is the most diffuse.

+ we've a demo: play a clip and watch MIRAGE's predicted whole-brain activity light up in sync 🧠🍿 📄 Paper: arxiv.org/abs/2605.29850 💻 Code: github.com/epflneuroail... 🤗 Model: huggingface.co/epfl-neuroai... 🌐 Demo: mirage-brain.epfl.ch Joint work w/ @akgokce.bsky.social & @mschrimpf.bsky.social

Badr AlKhamissi

Martin Schrimpf

mirage-brain.epfl.ch

MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding

Badr AlKhamissi