My favorite thing about my colleague's post on the scaling science behind Marin has to be the part where an open source contributor noticed an opportunity for improvement, quickly made a patch, and that idea made it into a big training run.
Al Merose
In our latest blog post, Marin team member Larry Dial describes the pretraining techniques we're using as we transition from dense models to more efficient Mixture of Experts (MoE) models. This demos the stability and predictability of MoEs, giving us a promising direction for our next training run