1Cademy - Zero Weight Initialization in Feed-Forward Networks

Learn Before

Vanishing/exploding gradient
Symmetry Breaking in Deep Learning

Concept

Zero Weight Initialization in Feed-Forward Networks

If we initialize the weight matrix $W$ as a matrix of zeros, then the gradient $\frac{d\mathcal{L}}{dz}$ in each neuron from each layer will receive the exact same value. No matter how long the network is trained, gradient descent will update these parameters identically, preventing the neurons from learning distinct features. This failure to differentiate neuron weights is known as the symmetry problem. Furthermore, because the weights are zero, the gradients propagated backward to earlier layers are multiplied by zero, immediately contributing to the vanishing gradient problem.

Updated 2026-05-10

Contributors are:

Who are from:

University of Michigan - Ann Arbor

🏆 4

Google

✔️ 1

Learn Before

Related