The weirdest observation: I generated movies visualizing the polytope boundaries for ReLU networks using Muon and AdamW.
Same experiment, same data, same random seed. The difference is the "crease pattern" that the optimizers produce.
That's a good point.
I am not an expert on CHERI, but I have looked at pointer tagging a lot. Perhaps I can help?
Yeah, I think AdamW organizes the creases so that the left-hand visualisation is more blue, and has fewer small white dots.
That is pretty wild :)
www.faz.net/premium/digi...
I wrote a FAZ guest article.
That slide fits *really* well to the arrival of LLMs
The AdamW videos compress to 1/6th the size of the Muon videos. Something AdamW is doing allows the crease visualisation to be compressed well, but not Muon. This is the weirdest observation ever.
Confession time: I use agentic coding all day, every day. It makes me much more productive.
But I am also terrified of skill atrophy, I feel like I need to break out pen & paper to force myself to "weight-lift" mentally so I don't forget how to think.
How do y'all handle this?
The most insightful take on Mythos I've seen so far. Everyone should read this but especially those who are currently thinking through the possible regulatory responses.