At the edge of this regime (where η ∝ 1/√m), there exists a well-defined infinite-width limit where feature learning persists in all hidden layers.
This Feature Learning Limit closely matches the behavior of optimally tuned finite-width networks under CE loss. (6/10)