RNNs like xLSTM with vertically chunked inference strategy for efficient memory usage: arxiv.org/abs/2604.18199
Chunking enables a linear-time and constant-memory like for TFLA for xLSTM arxiv.org/abs/2503.14376
Chunking blocks via recurrent updates speeds up computation considerably.
Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alter...