//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Paper - arxiv.org/abs/2505.11711 Work done with amazing collaborator Lifan Yuan, and advised by our amazing advisors @dilekh.bsky.social and Hao Peng.
May 21, 2025
Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from upda...
arxiv.org
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee