Fresh from bioRxiv our latest work introducing The Embedded Alphabet (TEA), a powerful new representation for protein sequences obtained by discretising ESM2 embeddings into 20 characters.
Pre-print: www.biorxiv.org/content/10.1...
๐งต๐(1/n)
Detecting remote homology with speed and sensitivity is crucial for tasks like function annotation and structure prediction. We introduce a novel approach using contrastive learning to convert protein...