Inlay

This reframes the folding problem as: what determines the burial of the hard-to-predict core residues? The core identity score is available on GitHub with a Google Colab notebook. Try it on your own structures! 8/8 Link: github.com/agrigas115/core_identity_score

(1/n) Does basal stem cell division orientation regulate skin stratification and tissue mechanics? And can tissue mechanics feed back to control division orientation? In our new preprint, we use a 3D vertex model to explore this @manningresearch.bsky.social @somiealo.bsky.social

Excited to highlight a new preprint about mechanical contributions to tissue homeostasis, from the Manning group in collaboration with the amazing Carien Niessen and Sara Wickstrom @sarawickstrom.bsky.social labs, spearheaded by Dr. Somiealo Azote: www.biorxiv.org/content/10.6...

Can hydrophobicity scales identify the correct core? The textbook picture says hydrophobic collapse drives folding. But ~23% of incorrectly folded models have cores that are more hydrophobic than the native fold. Current scales can't solve core identity by maximization. 7/8

To fairly compare, we measure bits/residue by accounting for label entropy and send random subsets of true labels. Core identity reaches ρ=0.9 at just 0.4 bits/residue, versus 0.68 for contacts and 0.58 for 3Di. It's the most efficient encoding we tested. 4/8

What about predicting from sequence alone? We trained a lightweight predictor on ESM2 embeddings for burial and compared to ESM2-predicted contacts. Predicting burial from sequence gives a better LDDT correlation than using contacts (ρ=0.82 vs 0.75), and combining the two doesn't help. 5/8

How much information does it take to fold a protein? Not much, if you use the right information! We find that residue burial, a binary label of core vs surface, encodes a protein's fold highly efficiently and even improves ESM2's structure representation. 1/8 www.biorxiv.org/content/10.6...

To test this, we encode ~24,000 CASP structural models using different representations - contact maps (N(N-1)/2 pairwise binary labels) and core identity (N binary labels) for example - and ask: how well does each predict the accuracy of the backbone (LDDT)? 2/8

Protein structure is controlled by a high-dimensional energy landscape, which is a function of all of the atomic coordinates of the protein. Can this landscape be accurately described by a low-dimensional representation? We find that residue core identity, a binary N-dimensional encoding indicating whether each of the N amino acids in a protein is buried in the core or not, can predict the protein's backbone conformation more efficiently than all other representations that we tested. Core identity is 4 times more efficient than previous estimates of the bits per residue needed to encode a protein's native fold, 2 times more efficient than the Cα contact map, and 1.5 times more efficient than the machine-learned embeddings from FoldSeek's 3Di. Even when the folded structure is unavailable, predicting each residue's burial from sequence yields a more accurate estimate of fold quality than predicting pairwise contacts from the same sequence information. Thus, this work emphasizes that the problem of determining a protein's native fold can be re-framed as predicting each residue's core identity. ### Competing Interest Statement The authors have declared no competing interest. Chan Zuckerberg Initiative (United States), 2023-329572 NIH, T32GM145452