//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization. We found that if you simply delete them after pretraining and recalibrate for <1% of the original budget, you unlock massive context windows. Smarter, not harder.
5mo
hardmaru
Introducing DroPE: Extending Context by Dropping Positional Embeddings We found embeddings like RoPE aid training but bottleneck long-sequence generalization. Our solution’s simple: treat them as a temporary training scaffold, not a permanent necessity. arxiv.org/abs/2512.12167 pub.sakana.ai/DroPE
5mo
Video
Sakana AI