5/ Takeaway
LLMs do not always need to externalize their thoughts.
They can learn to reason in working memory instead, decoupling intermediate computation from autoregressive generation 💡
Full paper:
arxiv.org/abs/2605.30343
Huge thanks to @hochreitersepp.bsky.social for the guidance!
To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to auto...