Learn Before
How to Initialize Weights to Prevent Vanishing/Exploding Gradients
To prevent the gradients of a neural network's activations from vanishing or exploding, weight initialization strategies adhere to two fundamental rules: the mean of the activations should be exactly zero, and their variance must remain constant across all layers. By satisfying these conditions, the backpropagated gradient signal avoids being multiplied by excessively small or large values. Consequently, maintaining a zero mean and constant variance guarantees a stable gradient signal throughout the network.
0
1
Contributors are:
Who are from:
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
Example of Weight Initialization
Vanishing/exploding gradient
Symmetry Breaking in Deep Learning
Transfer Learning in Deep Learning
Multi-task Learning in Deep Learning
Variance of Layer Output in Forward Propagation
Default Random Initialization
Xavier Initialization
Built-in Gaussian Parameter Initialization
Constant Parameter Initialization
Block-Specific Parameter Initialization
Forced Parameter Reinitialization
Custom Parameter Initialization
Direct Parameter Assignment
Lazy Parameter Initialization
How to Initialize Weights to Prevent Vanishing/Exploding Gradients