Xavier Initialization
Xavier initialization, named after its creators Glorot and Bengio (2010), is a standard technique designed to mitigate vanishing and exploding gradients by carefully setting the initial weights of a neural network layer. To balance the variance during both forward and backward propagation, it typically samples weights from a Gaussian distribution with a mean of and a variance of , where and represent the number of inputs and outputs of the layer respectively. While the underlying assumption of linear activations is often violated in practice, this initialization method has proven highly effective.
0
2
Contributors are:
Who are from:
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
Example of Weight Initialization
Vanishing/exploding gradient
Symmetry Breaking in Deep Learning
Transfer Learning in Deep Learning
Multi-task Learning in Deep Learning
Variance of Layer Output in Forward Propagation
Default Random Initialization
Xavier Initialization
Built-in Gaussian Parameter Initialization
Constant Parameter Initialization
Block-Specific Parameter Initialization
Forced Parameter Reinitialization
Custom Parameter Initialization
Direct Parameter Assignment
Lazy Parameter Initialization
How to Initialize Weights to Prevent Vanishing/Exploding Gradients
Xavier Initialization