1Cademy - Zero Weight Initialization in Feed-Forward Networks

Learn Before

Vanishing/exploding gradient
The Problem with Constant Initialization

Concept

Zero Weight Initialization in Feed-Forward Networks

If the weight matrix $\mathbf{W}$ of a neural network layer is initialized to all zeros, the gradient of the loss $\mathcal{L}$ with respect to the pre-activation vector, $\frac{\partial\mathcal{L}}{\partial\mathbf{z}}$ , will be identical for every neuron in that layer (assuming identical biases). During gradient descent, these parameters will update identically, preventing the neurons from learning distinct features and causing the symmetry problem. Furthermore, because the weight matrix is zero, backpropagating the gradient to earlier layers involves multiplication by $\mathbf{W}^T$ , which immediately zeroes out those gradients and contributes directly to the vanishing gradient problem.

Updated 2026-07-06

Contributors are:

Who are from:

University of Michigan - Ann Arbor

🏆 4

Google

✔️ 2

Learn Before

Related