New video! How do LLMs grow outrageously large yet blazingly fast?
The secret: Mixture of Experts (MoE)
In this video, we cover the role of FFNs, how to scale them without slowing down, and how to maintain load balance and training stability.
Full video here: youtu.be/0QQlYR1r6pQ