So apparently LLMs just update on false claims in documents even when they're explicitly labeled in the document as false claims.
arxiv.org/abs/2605.13829
We introduce Negation Neglect, where finetuning LLMs on documents that flag a claim as false makes them believe the claim is true. For example, models are finetuned on documents that convey "Ed Sheera...