To optimize for your own objective, start with this finetuning notebook. It defines a reward (via MoleculeEvaluator or your function), builds our AugmentedHC trainer, and runs a short loop so you can see “before vs after” quickly.
github.com/chandar-lab/...