5/ Takeaway
LLMs do not always need to externalize their thoughts.
They can learn to reason in working memory instead, decoupling intermediate computation from autoregressive generation 💡
Full paper:
arxiv.org/abs/2605.30343
Huge thanks to @hochreitersepp.bsky.social for the guidance!
Drug design is significantly accelerated by ConGLUDe. Protein–small-molecule interactions can be screened orders of magnitude faster using the ConGLUDe approach.
xLSTM for Lensed Gravitational Waves: arxiv.org/abs/2512.21370
sLSTM models fine-grained temporal structures, while mLSTM finds large-scale global patterns.
xLSTM achieves AUC beyond 0.99, a TPR above 98% a FPR below 1% and is robust against noise, lens type, lens mass.
Cool xLSTM application.
A high-performance trading architecture built around xLSTM: allenarch.dev/blog/combini...
The model is used as a market-state encoder, while reinforcement learning handles the trading decisions. This performance of xLSTM should be confirmed by further investigations.
xLSTM for Financial Time Series: arxiv.org/abs/2603.01820
"VLSTM achieved the highest overall Sharpe ratio"
"VxLSTM and LPatchTST exhibited superior downside-adjusted characteristics"
“xLSTM achieves the highest portfolio-level cost buffer"
xLSTM excels in financial time series as TiRex does.
Comparison of sub-quadratic architectures xLSTM, Mamba-2, and Gated DeltaNet: arxiv.org/abs/2606.12364
Comparison of xLSTM, Mamba-2, and Gated DeltaNet on code pre-training, distillation, and time-series.
xLSTM outperforms the others due to its gating scheme and state tracking.
RNNs like xLSTM with vertically chunked inference strategy for efficient memory usage: arxiv.org/abs/2604.18199
Chunking enables a linear-time and constant-memory like for TFLA for xLSTM arxiv.org/abs/2503.14376
Chunking blocks via recurrent updates speeds up computation considerably.
Symbol-equivariant Recurrent Reasoning Models (SE-RRM)
SE-RRM advances HRM and TRM -- guaranteed identical solutions for problems with permuted colors (ARC AGI) or digits (Sudoku).
Coolest part: extrapolation to larger problem sizes!!!
P: arxiv.org/abs/2603.02193
C: github.com/ml-jku/SE-RRM
xLSTM is more expressive than Transformer, Mamba: arxiv.org/abs/2603.03612
*nonlinear RNNs: sLSTM, LSTM
*DLPR linear RNNs: mLSTM, RWKV-7, DeltaNet
*Non PNC1-complete: Mamba, Transformer
“fundamental expressivity gaps between linear and nonlinear RNNs.”
World models require nonlinear RNNs.
To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to auto...
xLSTM Distillation: arxiv.org/abs/2603.15590
Near-lossless distillation of quadratic Transformer LLMs into linear-time xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance.
Efficient xLSTM variants of instruction-tuned Llama, Qwen, and Olmo models.
Lukas Aichberger
Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alter...
# AI in Drug discovery just BROKE THROUGH a wall #
A newer AI model, ConGLUDe, as fast but much more accurate than DrugCLIP.
Instead on just 40K structure-based data, ConGLUDe is trained on 100M datapoints from ligand-based data
P: arxiv.org/abs/2601.09693