Inlay

📢 Our new preprint is out on bioRxiv! We introduce RAG-ESM, a retrieval-augmented framework that improves pretrained protein language models like ESM2 by making them homology-aware with minimal additional training costs. 🔗 doi.org/10.1101/2025... 💻 github.com/Bitbol-Lab/r... 1/7

[1/8] 📄 New preprint! With Gionata Paolo Zalaffi & Anne-Florence Bitbol, we introduce ProteomeLM, a transformer that processes entire proteomes (prokaryotes and eukaryotes), enabling ultra-fast protein–protein interaction (PPI) prediction across the tree of life. 🔗 www.biorxiv.org/content/10.1...

Language models starting from biological sequence data are advancing many inference problems, both at the scale of single proteins, and at the scale of genomic neighborhoods. In this paper, we introduce ProteomeLM, a transformer-based language model that reasons on entire proteomes from species spanning the tree of life. Leveraging protein language model embeddings, ProteomeLM is trained to reconstruct masked protein embeddings using the whole proteomic context. It thus learns contextualized protein representations reflecting proteome-scale functional constraints. We show that ProteomeLM spontaneously captures protein-protein interactions (PPI) in its attention coefficients. We demonstrate that it screens whole interactomes orders of magnitude faster than amino-acid coevolution-based methods, and substantially outperforms them. We further develop ProteomeLM-PPI, a supervised PPI prediction network that combines ProteomeLM embeddings and attention coefficients, and achieves state-of-the-art performance across species and benchmarks. Finally, we introduce ProteomeLM-Ess, a supervised predictor of gene essentiality that generalizes across diverse taxa. Our results highlight the power of proteome-scale language models for addressing function and interactions at the organism level. ### Competing Interest Statement The authors have declared no competing interest. European Research Council, https://ror.org/0472cxd90, 851173