Folding models learn protein stability only implicitly. Without access to negative data, one can in principle make use of the folding free energy (dG) and the change in the free energy upon mutation (ddG). I believe simple aux losses could help for cases where a mutation is clearly disruptive.