Inlay

ProfilePosts

To advance the family-based modelling approach, we are releasing the entire framework open source: ProFam Atlas: A curated, large-scale training corpus containing nearly 40 million protein families. Code & Weights: github.com/alex-hh/prof... Data: zenodo.org/records/1771...

For design, ProFam-1 excels at homology-guided generation. It produces diverse sequences with low sequence identity to natural proteins while preserving predicted structural similarity and conservation patterns of the natural family, even when conditioning on just a single example sequence.

By conditioning on homologous sequences, ProFam-1 is competitive with state-of-the-art zero-shot fitness prediction on ProteinGym, outcompeting much larger PLMs such as ESM.

Built by CATH, TÜM and NVIDIA, ProFam-1 is our new open-source protein family language model (pfLM) designed to generate functional protein variants and predict fitness using in-context example sequences.

It was lovely to speak at the CATH 30 symposium, celebrating 30 years of the @cathgene3d.bsky.social protein structure classification database. I was presenting recent work on our new generative protein-family language model: preprint coming soon.

6mo

9mo

Video

www.biorxiv.org

Protein language models have become essential tools for engineering novel functional proteins. The emerging paradigm of family-based language models makes use of homologous sequences to steer protein ...

ProFam: Open-Source Protein Family Language Modelling for Fitness Prediction and Design

CATH-Gene3D