Homeostatic dimensional degeneracy during development strikes again!
Arseny Khakhalin
NEW PAPER. Why do larger networks train better?
"Because they contain more candidate *sub*networks that can learn the task" → lottery tickets
This popular explanation uses an appealing but misleading metaphorđź§µ
We propose an intuitive alternative grounded in theory: escape dimensions