To reproduce our unconstrained results, the quickstart notebook samples 30k SMILES and computes the six metrics from our table (Validity, Unique@1k, IntDiv, FCD, SNN, Frag/Scaf) in one cell.
github.com/chandar-lab/...
Code for the paper "NovoMolGen: Rethinking Molecular Language Model Pretraining" - chandar-lab/NovoMolGen