Comparison of sub-quadratic architectures xLSTM, Mamba-2, and Gated DeltaNet: arxiv.org/abs/2606.12364
Comparison of xLSTM, Mamba-2, and Gated DeltaNet on code pre-training, distillation, and time-series.
xLSTM outperforms the others due to its gating scheme and state tracking.
5/ Takeaway
LLMs do not always need to externalize their thoughts.
They can learn to reason in working memory instead, decoupling intermediate computation from autoregressive generation 💡
Full paper:
arxiv.org/abs/2605.30343
Huge thanks to @hochreitersepp.bsky.social for the guidance!
RNNs like xLSTM with vertically chunked inference strategy for efficient memory usage: arxiv.org/abs/2604.18199
Chunking enables a linear-time and constant-memory like for TFLA for xLSTM arxiv.org/abs/2503.14376
Chunking blocks via recurrent updates speeds up computation considerably.
xLSTM for Lensed Gravitational Waves: arxiv.org/abs/2512.21370
sLSTM models fine-grained temporal structures, while mLSTM finds large-scale global patterns.
xLSTM achieves AUC beyond 0.99, a TPR above 98% a FPR below 1% and is robust against noise, lens type, lens mass.
Cool xLSTM application.
xLSTM for Financial Time Series: arxiv.org/abs/2603.01820
"VLSTM achieved the highest overall Sharpe ratio"
"VxLSTM and LPatchTST exhibited superior downside-adjusted characteristics"
“xLSTM achieves the highest portfolio-level cost buffer"
xLSTM excels in financial time series as TiRex does.
Symbol-equivariant Recurrent Reasoning Models (SE-RRM)
SE-RRM advances HRM and TRM -- guaranteed identical solutions for problems with permuted colors (ARC AGI) or digits (Sudoku).
Coolest part: extrapolation to larger problem sizes!!!
P: arxiv.org/abs/2603.02193
C: github.com/ml-jku/SE-RRM
Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alter...
arxiv.org
To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to auto...
xLSTM is more expressive than Transformer, Mamba: arxiv.org/abs/2603.03612
*nonlinear RNNs: sLSTM, LSTM
*DLPR linear RNNs: mLSTM, RWKV-7, DeltaNet
*Non PNC1-complete: Mamba, Transformer
“fundamental expressivity gaps between linear and nonlinear RNNs.”
World models require nonlinear RNNs.
Drug design is significantly accelerated by ConGLUDe. Protein–small-molecule interactions can be screened orders of magnitude faster using the ConGLUDe approach.
A high-performance trading architecture built around xLSTM: allenarch.dev/blog/combini...
The model is used as a market-state encoder, while reinforcement learning handles the trading decisions. This performance of xLSTM should be confirmed by further investigations.
xLSTM Distillation: arxiv.org/abs/2603.15590
Near-lossless distillation of quadratic Transformer LLMs into linear-time xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance.
Efficient xLSTM variants of instruction-tuned Llama, Qwen, and Olmo models.
# AI in Drug discovery just BROKE THROUGH a wall #
A newer AI model, ConGLUDe, as fast but much more accurate than DrugCLIP.
Instead on just 40K structure-based data, ConGLUDe is trained on 100M datapoints from ligand-based data
P: arxiv.org/abs/2601.09693