Under He/Lecun inits, theory implies Kernel OR Unstable regimes as width→∞. Discrepancies (e.g. feature learning) are seen as finite width effects.
Our #NeurIPS2025 spotlight refutes this: practical nets do not converge to kernel limits; Feature learning persists as width→∞🧵