Inlay

RNNs like xLSTM with vertically chunked inference strategy for efficient memory usage: arxiv.org/abs/2604.18199 Chunking enables a linear-time and constant-memory like for TFLA for xLSTM arxiv.org/abs/2503.14376 Chunking blocks via recurrent updates speeds up computation considerably.

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alter...