Technologies like synthetic data, evaluations, and red-teaming are often framed as enhancing AI privacy and safety. But what if their effects lie elsewhere?
In a new paper with @realbrianjudge.bsky.social at #EAAMO25, we pull back the curtain on AI safety's toolkit. (1/n)
arxiv.org/pdf/2509.22872
How do we stop playing whack-a-mole when it comes to deepfake abuse? 🧵⚠️
Agents prioritize task completion rather than whether they should act. This is a consequence of how they are trained. My student @victorojewale.bsky.social has been investigating this and just wrote a (prize winning) paper arguing why (and how) we need a notion of "informed abstention". Link below.