Learn Before
  • Deep Learning Weight Initialization

Xavier/He Initialization

Xavier and He initialization are common initialization techniques for dealing with vanishing/exploding gradients. These strategies are used to normalize weight initialization process so that the variance of the weights can be maintained across the layers. Although these techniques have the same purpose, they are used with different activation functions. The former one (Xavier) is used with tanh function while the latter one (He) is used with ReLU activation function. If we will use Xavier initialization with ReLU we might get very poor results. In case of Xavier initialization following scaling factor is used: 1layersdims[l1]\sqrt \frac {1}{layersdims[l-1]}

In case of He initialization we have this factor: 2layersdims[l1]\sqrt \frac {2}{layersdims[l-1]}

0

2

4 years ago

Tags

Data Science

Related
  • Xavier/He Initialization

  • Example of Weight Initialization

  • Vanishing/exploding gradient

  • Symmetry Breaking in Deep Learning

  • How to Initialization Weights to Prevent Vanishing/Exploding Gradients

  • Transfer Learning in Deep Learning

  • Multi-task Learning in Deep Learning