//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
www.anthropic.com
Reward hacking: One example why AI firms definitely need game theorists on their teams. www.anthropic.com/research/eme...
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
From shortcuts to sabotage: natural emergent misalignment from reward hacking
6mo
Ben Greiner