Inlay

Results (3/4) – With evidence, models strongly adhere to it with high confidence, even for toxic or nonsensical interventions – Implausibility awareness is transient; once evidence appears, models rarely flag problems – Scaling, medical fine-tuning, and skeptical prompting offer little protection