Inlay

Molecular simulations of proteins are well-known to be computationally expensive. Here, we present a new latent-space-based method for modeling protein conformational flexibility at a very affordable computational cost. The method is data-driven and employs an autoencoder-based machine learning model for reversible dimensionality reduction of diverse conformations of the protein studied. Next, samples are selected from the low-dimensional latent space via Monte Carlo sampling. The folding and unfolding of the miniproteins can be sampled in minutes of computational time. We validated the method on four model systems: Tryptophan Cage, nonfolding variant of Tryptophan Cage, Villin headpiece, and human β-2-syntrophin PDZ domain (miniproteins with 20, 20, 35, and 95 residues, respectively). All systems were modeled at an all-atom resolution. Tryptophan Cage and Villin miniproteins show very similar populations of folded/unfolded states sampled by Monte Carlo simulations as the reference MD trajectories calculated by D. E. Shaw Research.