//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
We first start with establishing a 𝐭𝐡𝐞𝐨𝐫𝐞𝐭𝐢𝐜𝐚𝐥 𝐬𝐚𝐟𝐞𝐭𝐲 𝐠𝐮𝐚𝐫𝐚𝐧𝐭𝐞𝐞: At Nash Equilibrium, the defender provides safe responses to ANY adversarial input (Theorem 1). This motivates our game-theoretic approach to safety alignment beyond empirical defenses.
Mickel Liu
Jun 12, 2025