Inlay

5/ Takeaway LLMs do not always need to externalize their thoughts. They can learn to reason in working memory instead, decoupling intermediate computation from autoregressive generation 💡 Full paper: arxiv.org/abs/2605.30343 Huge thanks to @hochreitersepp.bsky.social for the guidance!

To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to auto...