Inlay

Profile

Human turned into a virtual homo during PhD at Helmholtz Munich

Joel Valdivia Ortega

📢 Thanks a lot to @lorenzlamm.bsky.social, @marionjasnin.bsky.social, @tingyingpeng.bsky.social, Franziska Eckardt and Benedikt Schworm for all the help in this work ❤️ and to you for reading 😍 ! I would love to get any feedback, so please feel free to reach out! 🧵9/9

☝️Topology preservation Choosing appropriate amplitudes, we can preserve the topology of the latent space. The smaller the amplitude, the less modifications are expected in the topology. 🧵5/9

Random embeddings create a radial anisotropy: the bigger the token's norm, the more it will be distorted by the random embedding. 🧵6/9

6mo

Random embeddings also create angular anisotropy: the less sparse the token is, the more distortion it will experience from the random embedding. 🧵7/9

🤟 Results Thus, increasing the token's norm or having non-sparse representations become more costly for the ViT, promoting less repurposing, better representations and, as a result, better quantitative performance. 🧵8/9

🧑‍🍳 Contribution We proposed replacing the learnable parameters from the MLPs by random variables turning them into random embeddings, creating the ✨Randomized-MLP (RMLP)✨. This architecture has a hyperparameter, amplitude, to control the standard deviation of the variables. 🧵4/9

✨Motivation Token's norm has been used to spot ViTs repurposing patch tokens to encode general information on void regions on natural images and regularisation techniques have been developed to avoid this. We saw this behaviour on regularised models when applied to medical images. 🧵2/9

6mo

In case you're missing my poster at #NeurIPS2025 about how I fine-tuned DINOv2 to ophthalmological images, here are some animations so you don't miss out! 🔗 Preprint: doi.org/10.48550/arX... 🔗 Code: github.com/peng-lab/rmlp 🧵1/9

🎨 Baseline Taking DINO and iBOT losses, let's consider the teacher providing two stable classes. The student and its MLP then need to learn how to match that classification, where part of the learning might go to the MLP. 🧵3/9

I wrote a paper with @lorenzlamm.bsky.social, @marionjasnin.bsky.social, @tingyingpeng.bsky.social, F. Eckardt and B. Schworm and got accepted at #NeurIPS2025 and fun fact: 90% of viewers are enby vegans! 🤟 So if u fit there, u might wanna check it out! Maybe also if you don't. We're allies here <3

6mo

7mo

Joel Valdivia Ortega

Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2

Vision Transformers (ViTs), such as DINOv2, achieve strong performance across domains but often repurpose low-informative patch tokens in ways that reduce the interpretability of attention and feature...

arxiv.org

Joel Valdivia Ortega