Inlay

Our very comprehensive evaluations show: ✅ Significant improvement on harmful refusal accuracy compared to the abliterated and instruct (IT) models (Table 1) ✅ Minimal compromise on benign compliance & general abilities (see Table 2 in the text).