Can't wait to read this after the NeurIPS deadline: arxiv.org/pdf/2604.28006
Tony S.F.
will be presented at ICML!
my coauthors have convinced me that it's not the best decision to name our NonSmooth Frank-Wolfe algorithm NSFW... i thought it was catchy.
You can even (approximately) represent the Frank-Wolfe update in this geometry through suitably chosen Φ!
The key idea is to use Φ-convexity to measure "smoothness" relative to a reference function Φ rather than a norm. This is similar to relative smoothness where ∇Φ* acts as a mirror map. We also show Polar Express approximates this map more closely than the ideal matrix sign.
i got asked by a friend if my figures were made with chatgpt because he liked them and, while for this time i could say no and show him a different talk with the same figures from before chatgpt, it saddened me to think everyone will likely assume this is the case from now on
What do you think of proofs that use color in this way?
A new paper about how to scale your training of LLMs when increasing the token budget, based on the convergence theory! Lots of empirical experiments validating the assumptions we make. arxiv.org/abs/2603.21191
New paper! We analyze proximal preconditioned gradient methods that extend Muon/Scion to handle nonconvex constraints (Stiefel manifold, spectral sphere, norm balls, ...) with convergence guarantees under heavy-tailed noise + variance reduction w/ STORM!
arxiv.org/abs/2605.11850
Also, a shoutout to this amazing paper by @tonysf.bsky.social and collaborators, which is well worth reading: arxiv.org/abs/2502.07529