This is cool and makes a lot of sense. Reminds me of the theory of neutral networks in evolutionary theory, where networks of neutral genotype changes enable populations to traverse the fitness landscape without getting stuck in local minima
NEW PAPER. Why do larger networks train better?
"Because they contain more candidate *sub*networks that can learn the task" → lottery tickets
This popular explanation uses an appealing but misleading metaphorđź§µ
We propose an intuitive alternative grounded in theory: escape dimensions