//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
2) We change our parameterization to a diagonal mechanism inspired by SSMs, which lets us reduce parameters by 10x while massively increasing performance 💪 For our initial benchmarks we pre-train Banyan on 10M tokens of English and test STS, retrieval and classification... 4/🧵