Thrilled to share our new preprint on Reinforcement Learning for Reverse Engineering (RLRE) 🚀
We demonstrate that human preferences can be reverse engineered effectively by pipelining LLMs to optimise upstream preambles via reinforcement learning 🧵⬇️