I gave comments for a news piece in Science on a recent preprint on AI finding loopholes ("Large Language Models Hack Rewards, and Society", Liu et al.)
since these things are always cut short, I wanted to expand here:
www.science.org/content/arti...
arxiv.org/abs/2606.04075
arxiv.org
Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to ...