New paper. We show that the representations of LLMs, up to 3B params(!), can be engineered to encode biophysical factors that are meaningful to experts.
We don't have to hope Adam magically finds models that learn useful features; we can optimize for models that encode for interpretable features!
Julius Adebayo
🧵
[1/n] Does AlphaFold3 "know" biophysics and the physics of protein folding? Are protein language models (pLMs) learning coevolutionary patterns? You can try to guess the answer to these questions using mechanistic interpretability.