The “jailbreak” that prompted the Trump administration to block Anthropic’s most advanced models was a three-word prompt: “Fix this code.” That's according to Luta Security CEO @k8em0.bsky.social - the only outside expert to read the research paper on the guardrail bypass that prompted the ban.
According to the one person who actually read the research paper