Inlay

//

Post

will be presented at ICML!

1mo

Tony S.F.

A new paper about how to scale your training of LLMs when increasing the token budget, based on the convergence theory! Lots of empirical experiments validating the assumptions we make. arxiv.org/abs/2603.21191

2mo

We study the role of batch size in stochastic conditional gradient methods under a $μ$-Kurdyka-Łojasiewicz ($μ$-KL) condition. Focusing on momentum-based stochastic conditional gradient algorithms (e....

arxiv.org

On the Role of Batch Size in Stochastic Conditional Gradient Methods

Tony S.F.