This difference is especially pronounced for proteins with few homologues, such as de novo proteins.
Additionally, we conducted an initial study on protein folding pathways! We ran direct Langevin simulations on Protein G, NuG2, and Protein L. We found that Protein G folds through the C terminus while NuG2 and Protein L are shifted toward the N terminus, which is consistent with experiment!
Diffusion models actually learn a series of time-indexed energy landscapes, which are corrupted with different amounts of noise. The ranking ability of ProteinEBM peaked slightly above t=0. Inspired by this finding we trained an "expert" model only on low time levels, which we call ProteinEBM-x.
To recap our original method, we used denoising score matching to train an energy-based model that approximates the free energies of protein conformations. We found that this worked well for ranking protein structures and predicting the effects of mutations on stability.
ProteinEBM-x gives a huge boost in performance, both in ranking and stability prediction. In particular, ProteinEBM-x achieves state-of-the-art results at zero-shot stability ranking in ProteinGym, outperforming PLMs with over 15x the parameters.
If you think this seems cool and useful, checkout our updated preprint and GitHub:
www.biorxiv.org/content/10.6...
github.com/jproney/Prot...
I'm excited to announce some major updates to our ProteinEBM paper with Chenxi Ou @sokrypton.org!
jproney
jproney
jproney
jproney
jproney
jproney
They are kidnapping young children and murdering citizens in the street in broad daylight and lying to us when we can see what they are doing. This will be in textbooks and generations from now will ask us what we stood for. ππ§