Check out our new paper on evaluating LLM agents on their preference for achieving their goal and avoiding human harm, called ManagerBench👔
Adi Simhi
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?
🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵