//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
Comparison of sub-quadratic architectures xLSTM, Mamba-2, and Gated DeltaNet: arxiv.org/abs/2606.12364 Comparison of xLSTM, Mamba-2, and Gated DeltaNet on code pre-training, distillation, and time-series. xLSTM outperforms the others due to its gating scheme and state tracking.
xLSTM for Lensed Gravitational Waves: arxiv.org/abs/2512.21370 sLSTM models fine-grained temporal structures, while mLSTM finds large-scale global patterns. xLSTM achieves AUC beyond 0.99, a TPR above 98% a FPR below 1% and is robust against noise, lens type, lens mass. Cool xLSTM application.
Drug design is significantly accelerated by ConGLUDe. Protein–small-molecule interactions can be screened orders of magnitude faster using the ConGLUDe approach.
xLSTM for Financial Time Series: arxiv.org/abs/2603.01820 "VLSTM achieved the highest overall Sharpe ratio" "VxLSTM and LPatchTST exhibited superior downside-adjusted characteristics" “xLSTM achieves the highest portfolio-level cost buffer" xLSTM excels in financial time series as TiRex does.
A high-performance trading architecture built around xLSTM: allenarch.dev/blog/combini... The model is used as a market-state encoder, while reinforcement learning handles the trading decisions. This performance of xLSTM should be confirmed by further investigations.
xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear-time xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. Efficient xLSTM variants of instruction-tuned Llama, Qwen, and Olmo models.
RNNs like xLSTM with vertically chunked inference strategy for efficient memory usage: arxiv.org/abs/2604.18199 Chunking enables a linear-time and constant-memory like for TFLA for xLSTM arxiv.org/abs/2503.14376 Chunking blocks via recurrent updates speeds up computation considerably.
xLSTM is more expressive than Transformer, Mamba: arxiv.org/abs/2603.03612 *nonlinear RNNs: sLSTM, LSTM *DLPR linear RNNs: mLSTM, RWKV-7, DeltaNet *Non PNC1-complete: Mamba, Transformer “fundamental expressivity gaps between linear and nonlinear RNNs.” World models require nonlinear RNNs.
5/ Takeaway LLMs do not always need to externalize their thoughts. They can learn to reason in working memory instead, decoupling intermediate computation from autoregressive generation 💡 Full paper: arxiv.org/abs/2605.30343 Huge thanks to @hochreitersepp.bsky.social for the guidance!
Symbol-equivariant Recurrent Reasoning Models (SE-RRM) SE-RRM advances HRM and TRM -- guaranteed identical solutions for problems with permuted colors (ARC AGI) or digits (Sudoku). Coolest part: extrapolation to larger problem sizes!!! P: arxiv.org/abs/2603.02193 C: github.com/ml-jku/SE-RRM
2d
5mo
4mo
3mo
2mo
2mo
1mo
2mo
15d
3mo
Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alter...
arxiv.org
Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models
Unlocking the Working Memory of Large Language Models for Latent Reasoning
To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to auto...
arxiv.org
Günter Klambauer
Lukas Aichberger
# AI in Drug discovery just BROKE THROUGH a wall # A newer AI model, ConGLUDe, as fast but much more accurate than DrugCLIP. Instead on just 40K structure-based data, ConGLUDe is trained on 100M datapoints from ligand-based data P: arxiv.org/abs/2601.09693
4mo
Günter Klambauer