Concept

Zero Weight Initialization in Feed-Forward Networks

If we initialize the weight matrix WW as a matrix of zeros, then the gradient dLdz\frac{d\mathcal{L}}{dz} in each neuron from each layer will receive the exact same value. No matter how long the network is trained, gradient descent will update these parameters identically, preventing the neurons from learning distinct features. This failure to differentiate neuron weights is known as the symmetry problem. Furthermore, because the weights are zero, the gradients propagated backward to earlier layers are multiplied by zero, immediately contributing to the vanishing gradient problem.

Image 0

0

1

Updated 2026-05-10

Tags

Data Science