By conditioning on homologous sequences, ProFam-1 is competitive with state-of-the-art zero-shot fitness prediction on ProteinGym, outcompeting much larger PLMs such as ESM.
From structures to sequences, now Alex Bateman and the quest to annotate and classify all proteins!
Rob Finn on MGnify, everything bacteria and functions in different environments
Starting our afternoon session with a talk by Sameer Velankar, of PDBe and AFDB fame among other endeavours!
Built by CATH, TÜM and NVIDIA, ProFam-1 is our new open-source protein family language model (pfLM) designed to generate functional protein variants and predict fitness using in-context example sequences.
And now @gonzaparra.bsky.social on his first talk on protein frustration as a PI! Well done!
To advance the family-based modelling approach, we are releasing the entire framework open source:
ProFam Atlas: A curated, large-scale training corpus containing nearly 40 million protein families.
Code & Weights: github.com/alex-hh/prof...
Data: zenodo.org/records/1771...
For design, ProFam-1 excels at homology-guided generation. It produces diverse sequences with low sequence identity to natural proteins while preserving predicted structural similarity and conservation patterns of the natural family, even when conditioning on just a single example sequence.
Now Maria MartÃn from UniProt is telling us how AI-based tools are shaping the future of one of the key resources for protein sequences and function.
It was lovely to speak at the CATH 30 symposium, celebrating 30 years of the @cathgene3d.bsky.social protein structure classification database. I was presenting recent work on our new generative protein-family language model: preprint coming soon.