Inlay

ProfilePosts

Comparison of sub-quadratic architectures xLSTM, Mamba-2, and Gated DeltaNet: arxiv.org/abs/2606.12364 Comparison of xLSTM, Mamba-2, and Gated DeltaNet on code pre-training, distillation, and time-series. xLSTM outperforms the others due to its gating scheme and state tracking.

5/ Takeaway LLMs do not always need to externalize their thoughts. They can learn to reason in working memory instead, decoupling intermediate computation from autoregressive generation 💡 Full paper: arxiv.org/abs/2605.30343 Huge thanks to @hochreitersepp.bsky.social for the guidance!

RNNs like xLSTM with vertically chunked inference strategy for efficient memory usage: arxiv.org/abs/2604.18199 Chunking enables a linear-time and constant-memory like for TFLA for xLSTM arxiv.org/abs/2503.14376 Chunking blocks via recurrent updates speeds up computation considerably.

xLSTM for Lensed Gravitational Waves: arxiv.org/abs/2512.21370 sLSTM models fine-grained temporal structures, while mLSTM finds large-scale global patterns. xLSTM achieves AUC beyond 0.99, a TPR above 98% a FPR below 1% and is robust against noise, lens type, lens mass. Cool xLSTM application.

xLSTM for Financial Time Series: arxiv.org/abs/2603.01820 "VLSTM achieved the highest overall Sharpe ratio" "VxLSTM and LPatchTST exhibited superior downside-adjusted characteristics" “xLSTM achieves the highest portfolio-level cost buffer" xLSTM excels in financial time series as TiRex does.

Symbol-equivariant Recurrent Reasoning Models (SE-RRM) SE-RRM advances HRM and TRM -- guaranteed identical solutions for problems with permuted colors (ARC AGI) or digits (Sudoku). Coolest part: extrapolation to larger problem sizes!!! P: arxiv.org/abs/2603.02193 C: github.com/ml-jku/SE-RRM

15d

5mo

1mo

3mo

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alter...

arxiv.org

To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to auto...

arxiv.org

Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models

Günter Klambauer

Lukas Aichberger

xLSTM is more expressive than Transformer, Mamba: arxiv.org/abs/2603.03612 *nonlinear RNNs: sLSTM, LSTM *DLPR linear RNNs: mLSTM, RWKV-7, DeltaNet *Non PNC1-complete: Mamba, Transformer “fundamental expressivity gaps between linear and nonlinear RNNs.” World models require nonlinear RNNs.

2mo

Drug design is significantly accelerated by ConGLUDe. Protein–small-molecule interactions can be screened orders of magnitude faster using the ConGLUDe approach.

4mo

A high-performance trading architecture built around xLSTM: allenarch.dev/blog/combini... The model is used as a market-state encoder, while reinforcement learning handles the trading decisions. This performance of xLSTM should be confirmed by further investigations.

xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear-time xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. Efficient xLSTM variants of instruction-tuned Llama, Qwen, and Olmo models.

2mo

# AI in Drug discovery just BROKE THROUGH a wall # A newer AI model, ConGLUDe, as fast but much more accurate than DrugCLIP. Instead on just 40K structure-based data, ConGLUDe is trained on 100M datapoints from ligand-based data P: arxiv.org/abs/2601.09693

4mo

Günter Klambauer