Inlay

And this isn't a quirk of one model. Across 2 backbone families and 3 scales, native fusion wins at every single scale. Fusing modalities during pretraining yields features that are more brain-aligned than stitching unimodal streams together afterward.