Learn Before
How to Initialization Weights to Prevent Vanishing/Exploding Gradients
To prevent the gradients of the network’s activations from vanishing or exploding, we will stick to the following rules of thumb:
- The mean of the activations should be zero.
- The variance of the activations should stay the same across every layer.
Under these two assumptions, the backpropagated gradient signal should not be multiplied by values too small or too large in any layer.
Ensuring zero-mean and maintaining the value of the variance of the input of every layer guarantees no exploding/vanishing signal
0
1
Contributors are:
Who are from:
Tags
Data Science
Related
Example of Weight Initialization
Vanishing/exploding gradient
Symmetry Breaking in Deep Learning
How to Initialization Weights to Prevent Vanishing/Exploding Gradients
Transfer Learning in Deep Learning
Multi-task Learning in Deep Learning
Variance of Layer Output in Forward Propagation
Default Random Initialization
Xavier Initialization