//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
🎉 Excited to share that the last paper of my PhD is now published in PRX Life! We introduce RAG-ESM, a retrieval-augmented framework that makes pretrained protein language models (like ESM2) homology-aware with minimal training cost. 📄 Paper: journals.aps.org/prxlife/abst...
10mo
[3/8] 🧬 Encoding strategy: Instead of positional encoding, ProteomeLM introduces a functional encoding based on orthologous groups. Thus the model can leverage functional encoding and other proteins. This is especially important in eukaryotes, where gene order is less conserved.
Damiano Sgarbossa
10mo
Apr 11, 2025
📢 Our new preprint is out on bioRxiv! We introduce RAG-ESM, a retrieval-augmented framework that improves pretrained protein language models like ESM2 by making them homology-aware with minimal additional training costs. 🔗 doi.org/10.1101/2025... 💻 github.com/Bitbol-Lab/r... 1/7
Protein-protein interactions studied by @cyrilmalbranke.bsky.social #PragueBioML @elixircz.bsky.social
Cyril Malbranke
10mo
Damiano Sgarbossa
[2/8] 🧬 Training objective: ProteomeLM uses a custom masked language modeling task, predicting masked ESM-C representations of proteins within the proteome.
[8/8] 💻 Resources: • Training dataset • 4 pre-trained models (XS → L) • Code & interactive notebooks 🔗 huggingface.co/collections/... 🔗 github.com/Bitbol-Lab/P...
10mo
Vojtech Spiwok
[7/8] 📊 In conclusion, results show strong performances across species and benchmarks for both PPI prediction and gene essentiality. ProteomeLM makes proteome-wide analysis more practical, easing large-scale studies, including in complex eukaryotic proteomes.
10mo
[6/8] 🎯 Beyond PPIs: ProteomeLM predicts gene essentiality across diverse taxa (e.g. E. coli, yeast, minimal cells), highlighting its potential for broad downstream applications.
[4/8] 🎯 Key finding: Attention heads spontaneously encode protein–protein interaction networks. Some heads can reach an AUC of 0.92 in discriminating interacting vs non-interacting pairs.
10mo
[5/8] âš¡ This allows unsupervised and supervised PPI prediction at proteome scale in minutes, several orders of magnitude faster than coevolution-based methods such as DCA. Try it here: github.com/Bitbol-Lab/P...
[1/8] 📄 New preprint! With Gionata Paolo Zalaffi & Anne-Florence Bitbol, we introduce ProteomeLM, a transformer that processes entire proteomes (prokaryotes and eukaryotes), enabling ultra-fast protein–protein interaction (PPI) prediction across the tree of life. 🔗 www.biorxiv.org/content/10.1...
Cyril Malbranke
10mo
10mo
10mo
Cyril Malbranke
10mo
Cyril Malbranke
Cyril Malbranke
Cyril Malbranke
Cyril Malbranke
Language models starting from biological sequence data are advancing many inference problems, both at the scale of single proteins, and at the scale of genomic neighborhoods. In this paper, we introduce ProteomeLM, a transformer-based language model that reasons on entire proteomes from species spanning the tree of life. Leveraging protein language model embeddings, ProteomeLM is trained to reconstruct masked protein embeddings using the whole proteomic context. It thus learns contextualized protein representations reflecting proteome-scale functional constraints. We show that ProteomeLM spontaneously captures protein-protein interactions (PPI) in its attention coefficients. We demonstrate that it screens whole interactomes orders of magnitude faster than amino-acid coevolution-based methods, and substantially outperforms them. We further develop ProteomeLM-PPI, a supervised PPI prediction network that combines ProteomeLM embeddings and attention coefficients, and achieves state-of-the-art performance across species and benchmarks. Finally, we introduce ProteomeLM-Ess, a supervised predictor of gene essentiality that generalizes across diverse taxa. Our results highlight the power of proteome-scale language models for addressing function and interactions at the organism level. ### Competing Interest Statement The authors have declared no competing interest. European Research Council, https://ror.org/0472cxd90, 851173
www.biorxiv.org
ProteomeLM: A proteome-scale language model allowing fast prediction of protein-protein interactions and gene essentiality across taxa
Cyril Malbranke