To advance the family-based modelling approach, we are releasing the entire framework open source:
ProFam Atlas: A curated, large-scale training corpus containing nearly 40 million protein families.
Code & Weights: github.com/alex-hh/prof...
Data: zenodo.org/records/1771...
Protein language models have become essential tools for engineering novel functional proteins. The emerging paradigm of family-based language models makes use of homologous sequences to steer protein ...