1Cademy - Xavier/He Initialization

Learn Before

Deep Learning Weight Initialization

Xavier/He Initialization

Xavier and He initialization are common initialization techniques for dealing with vanishing/exploding gradients. These strategies are used to normalize weight initialization process so that the variance of the weights can be maintained across the layers. Although these techniques have the same purpose, they are used with different activation functions. The former one (Xavier) is used with tanh function while the latter one (He) is used with ReLU activation function. If we will use Xavier initialization with ReLU we might get very poor results. In case of Xavier initialization following scaling factor is used: $\sqrt \frac {1}{layersdims[l-1]}$

In case of He initialization we have this factor: $\sqrt \frac {2}{layersdims[l-1]}$

4 years ago

Contributors are:

Nineli Lashkarashvili

🏆 4

Who are from:

San Diego State University

🏆 4

Learn Before

Related