//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
My favorite thing about my colleague's post on the scaling science behind Marin has to be the part where an open source contributor noticed an opportunity for improvement, quickly made a patch, and that idea made it into a big training run.
13d
Al Merose
In our latest blog post, Marin team member Larry Dial describes the pretraining techniques we're using as we transition from dense models to more efficient Mixture of Experts (MoE) models. This demos the stability and predictability of MoEs, giving us a promising direction for our next training run
13d
Open Athena | Improving our LLM Pretraining Efficiency
Open Athena is a nonprofit that accelerates academia with capabilities from the AI frontier
openathena.ai
Open Athena