//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
Can't wait to read this after the NeurIPS deadline: arxiv.org/pdf/2604.28006
1mo
Tony S.F.
will be presented at ICML!
my coauthors have convinced me that it's not the best decision to name our NonSmooth Frank-Wolfe algorithm NSFW... i thought it was catchy.
You can even (approximately) represent the Frank-Wolfe update in this geometry through suitably chosen Φ!
The key idea is to use Φ-convexity to measure "smoothness" relative to a reference function Φ rather than a norm. This is similar to relative smoothness where ∇Φ* acts as a mirror map. We also show Polar Express approximates this map more closely than the ideal matrix sign.
i got asked by a friend if my figures were made with chatgpt because he liked them and, while for this time i could say no and show him a different talk with the same figures from before chatgpt, it saddened me to think everyone will likely assume this is the case from now on
What do you think of proofs that use color in this way?
A new paper about how to scale your training of LLMs when increasing the token budget, based on the convergence theory! Lots of empirical experiments validating the assumptions we make. arxiv.org/abs/2603.21191
1mo
18d
1mo
New paper! We analyze proximal preconditioned gradient methods that extend Muon/Scion to handle nonconvex constraints (Stiefel manifold, spectral sphere, norm balls, ...) with convergence guarantees under heavy-tailed noise + variance reduction w/ STORM! arxiv.org/abs/2605.11850
1mo
11d
27d
Also, a shoutout to this amazing paper by @tonysf.bsky.social and collaborators, which is well worth reading: arxiv.org/abs/2502.07529
2mo
1mo
arxiv.org