The LLM has to do something like schema-conditioned infilling: produce a high-probability member of the equivalence class consistent with those constraints. So I'm not sure how unexpected the results are? That's roughly what I'd expect from matching to structurally similar training passages. 2/3