//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
PostEmbed
I gave comments for a news piece in Science on a recent preprint on AI finding loopholes ("Large Language Models Hack Rewards, and Society", Liu et al.) since these things are always cut short, I wanted to expand here: www.science.org/content/arti... arxiv.org/abs/2606.04075
5d
arxiv.org
Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to ...
Large Language Models Hack Rewards, and Society
Tomer Ullman