//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
RNNs like xLSTM with vertically chunked inference strategy for efficient memory usage: arxiv.org/abs/2604.18199 Chunking enables a linear-time and constant-memory like for TFLA for xLSTM arxiv.org/abs/2503.14376 Chunking blocks via recurrent updates speeds up computation considerably.
1mo
A high-performance trading architecture built around xLSTM: allenarch.dev/blog/combini... The model is used as a market-state encoder, while reinforcement learning handles the trading decisions. This performance of xLSTM should be confirmed by further investigations.
xLSTM for Financial Time Series: arxiv.org/abs/2603.01820 "VLSTM achieved the highest overall Sharpe ratio" "VxLSTM and LPatchTST exhibited superior downside-adjusted characteristics" “xLSTM achieves the highest portfolio-level cost buffer" xLSTM excels in financial time series as TiRex does.
Comparison of sub-quadratic architectures xLSTM, Mamba-2, and Gated DeltaNet: arxiv.org/abs/2606.12364 Comparison of xLSTM, Mamba-2, and Gated DeltaNet on code pre-training, distillation, and time-series. xLSTM outperforms the others due to its gating scheme and state tracking.
xLSTM for Lensed Gravitational Waves: arxiv.org/abs/2512.21370 sLSTM models fine-grained temporal structures, while mLSTM finds large-scale global patterns. xLSTM achieves AUC beyond 0.99, a TPR above 98% a FPR below 1% and is robust against noise, lens type, lens mass. Cool xLSTM application.
Drug design is significantly accelerated by ConGLUDe. Protein–small-molecule interactions can be screened orders of magnitude faster using the ConGLUDe approach.
2mo
3mo
2d
5mo
4mo
xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear-time xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. Efficient xLSTM variants of instruction-tuned Llama, Qwen, and Olmo models.
xLSTM is more expressive than Transformer, Mamba: arxiv.org/abs/2603.03612 *nonlinear RNNs: sLSTM, LSTM *DLPR linear RNNs: mLSTM, RWKV-7, DeltaNet *Non PNC1-complete: Mamba, Transformer “fundamental expressivity gaps between linear and nonlinear RNNs.” World models require nonlinear RNNs.
2mo
2mo
3mo
Symbol-equivariant Recurrent Reasoning Models (SE-RRM) SE-RRM advances HRM and TRM -- guaranteed identical solutions for problems with permuted colors (ARC AGI) or digits (Sudoku). Coolest part: extrapolation to larger problem sizes!!! P: arxiv.org/abs/2603.02193 C: github.com/ml-jku/SE-RRM
Günter Klambauer
5/ Takeaway LLMs do not always need to externalize their thoughts. They can learn to reason in working memory instead, decoupling intermediate computation from autoregressive generation 💡 Full paper: arxiv.org/abs/2605.30343 Huge thanks to @hochreitersepp.bsky.social for the guidance!
15d
Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alter...
arxiv.org
Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models
Lukas Aichberger
# AI in Drug discovery just BROKE THROUGH a wall # A newer AI model, ConGLUDe, as fast but much more accurate than DrugCLIP. Instead on just 40K structure-based data, ConGLUDe is trained on 100M datapoints from ligand-based data P: arxiv.org/abs/2601.09693
4mo
Günter Klambauer
To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to auto...
arxiv.org
Unlocking the Working Memory of Large Language Models for Latent Reasoning