xLSTM Distillation: arxiv.org/abs/2603.15590
Near-lossless distillation of quadratic Transformer LLMs into linear-time xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance.
Efficient xLSTM variants of instruction-tuned Llama, Qwen, and Olmo models.