Learn Before
Deep Learning Weight Initialization
Xavier/He Initialization
Xavier and He initialization are common initialization techniques for dealing with vanishing/exploding gradients. These strategies are used to normalize weight initialization process so that the variance of the weights can be maintained across the layers. Although these techniques have the same purpose, they are used with different activation functions. The former one (Xavier) is used with tanh function while the latter one (He) is used with ReLU activation function. If we will use Xavier initialization with ReLU we might get very poor results. In case of Xavier initialization following scaling factor is used:
In case of He initialization we have this factor:
0
2
Tags
Data Science
Related
Xavier/He Initialization
Example of Weight Initialization
Vanishing/exploding gradient
Symmetry Breaking in Deep Learning
How to Initialization Weights to Prevent Vanishing/Exploding Gradients
Transfer Learning in Deep Learning
Multi-task Learning in Deep Learning