The Breach: When AI Starts to Cheat 🚨
New research by PIs Jonas Geiping and Maksym Andriushchenko says we have a major blind spot. Most safety systems plan to rely on a smaller monitor model to watch over an untrusted AI agent.
👉 institute-tue.ellis.eu/en/news/the-...