Learn Before
Concept

How to Initialization Weights to Prevent Vanishing/Exploding Gradients

To prevent the gradients of the network’s activations from vanishing or exploding, we will stick to the following rules of thumb:

  • The mean of the activations should be zero.
  • The variance of the activations should stay the same across every layer.

Under these two assumptions, the backpropagated gradient signal should not be multiplied by values too small or too large in any layer.

Ensuring zero-mean and maintaining the value of the variance of the input of every layer guarantees no exploding/vanishing signal

0

1

Updated 2021-03-19

Tags

Data Science