Concept

What happens if we initialize the weights of a feed forward network to 0s?

If we initialize WWs as a matrix of 0s, then dLdz\frac{d\mathcal L}{dz} in each neuron from each layer would get the same value and no matter how long you train the network, gradient descent would get stuck at the same point. This is usually called the symmetry breaking problem.

Image 0

0

1

Updated 2021-03-19

Tags

Data Science