Inlay

ProfilePosts

Can we match self-supervised backpropagation using local learning rules? We show it is possible in our new paper accepted by ICML. We achieve: 1. theoretical equivalence to BP in a controlled setup 2. new SOTA for local learning across image datasets 3. same performance as BP on multiple datasets

For the comp neuro readers: CLAPP++ is still a three-factor Hebbian plasticity rule: plasticity = (neuromodulator) × (dendritic prediction) × (Hebbian term) When there is direct feedback, the dendritic prediction comes from the top layer — matching findings in neuroscience experiments.

In supervised setups, many local learning algorithms are proposed to approximate the gradient of BP in theory and approach BP performance on benchmarks. However, for self-supervised learning, the performance gap is larger, and we miss a theory to compare gradients between local-SSL and BP.

Paper is on arxiv.org/abs/2601.21683. This work is done together with my fantastic colleagues: @bellecguill.bsky.social, Ariane Delrocq, and Wulfram Gerstner. We thank members of LCN (@gerstnerlab.bsky.social), Bernd Illing, Xing Chen, and reviewers for their insightful discussions.

Driven by these findings, we develop variants of local-SSL (CLAPP++). They reach the performance of BP baselines on CIFAR10, STL-10, Tiny-ImageNet, while also setting new SOTA of local learning rules on these dataset and ImageNet. Bonus: 40-60% less GPU VRAM and shorter wall clock time than BP.

In deep linear convnets, theory also indicates that 2D spatial dependence is necessary for local-SSL gradients to align with BP gradients. Empirically, both direct feedback and 2D spatial dependence improve the gradient similarity.

In deep linear networks with orthonormal feedforward weights, we prove that local-SSL and BP have identical gradient updates. If orthonormality is broken by shrinking layer widths, we prove that direct feedback from the last layer makes local-SSL gradients more similar to BP gradients.

We focus on a class of local learning rules, called local-SSL, that optimize self-supervised objectives at each layer rather than at the final output layer. Gradients are detached between layers, so no backward pass is needed. Examples are CLAPP(Illing et al. 2021) and Forward-Forward algorithms.

Zihan Wu