Inlay

Profile

Comparison of sub-quadratic architectures xLSTM, Mamba-2, and Gated DeltaNet: arxiv.org/abs/2606.12364 Comparison of xLSTM, Mamba-2, and Gated DeltaNet on code pre-training, distillation, and time-series. xLSTM outperforms the others due to its gating scheme and state tracking.

xLSTM for Lensed Gravitational Waves: arxiv.org/abs/2512.21370 sLSTM models fine-grained temporal structures, while mLSTM finds large-scale global patterns. xLSTM achieves AUC beyond 0.99, a TPR above 98% a FPR below 1% and is robust against noise, lens type, lens mass. Cool xLSTM application.

Drug design is significantly accelerated by ConGLUDe. Protein–small-molecule interactions can be screened orders of magnitude faster using the ConGLUDe approach.

xLSTM for Financial Time Series: arxiv.org/abs/2603.01820 "VLSTM achieved the highest overall Sharpe ratio" "VxLSTM and LPatchTST exhibited superior downside-adjusted characteristics" “xLSTM achieves the highest portfolio-level cost buffer" xLSTM excels in financial time series as TiRex does.

A high-performance trading architecture built around xLSTM: allenarch.dev/blog/combini... The model is used as a market-state encoder, while reinforcement learning handles the trading decisions. This performance of xLSTM should be confirmed by further investigations.

xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear-time xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. Efficient xLSTM variants of instruction-tuned Llama, Qwen, and Olmo models.

RNNs like xLSTM with vertically chunked inference strategy for efficient memory usage: arxiv.org/abs/2604.18199 Chunking enables a linear-time and constant-memory like for TFLA for xLSTM arxiv.org/abs/2503.14376 Chunking blocks via recurrent updates speeds up computation considerably.

xLSTM is more expressive than Transformer, Mamba: arxiv.org/abs/2603.03612 *nonlinear RNNs: sLSTM, LSTM *DLPR linear RNNs: mLSTM, RWKV-7, DeltaNet *Non PNC1-complete: Mamba, Transformer “fundamental expressivity gaps between linear and nonlinear RNNs.” World models require nonlinear RNNs.

5/ Takeaway LLMs do not always need to externalize their thoughts. They can learn to reason in working memory instead, decoupling intermediate computation from autoregressive generation 💡 Full paper: arxiv.org/abs/2605.30343 Huge thanks to @hochreitersepp.bsky.social for the guidance!

Symbol-equivariant Recurrent Reasoning Models (SE-RRM) SE-RRM advances HRM and TRM -- guaranteed identical solutions for problems with permuted colors (ARC AGI) or digits (Sudoku). Coolest part: extrapolation to larger problem sizes!!! P: arxiv.org/abs/2603.02193 C: github.com/ml-jku/SE-RRM

5mo

4mo

3mo

2mo

1mo

2mo

15d

3mo

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alter...

arxiv.org

Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models

Unlocking the Working Memory of Large Language Models for Latent Reasoning

To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to auto...

arxiv.org

Günter Klambauer

Lukas Aichberger

# AI in Drug discovery just BROKE THROUGH a wall # A newer AI model, ConGLUDe, as fast but much more accurate than DrugCLIP. Instead on just 40K structure-based data, ConGLUDe is trained on 100M datapoints from ligand-based data P: arxiv.org/abs/2601.09693

4mo

Günter Klambauer