1Cademy - Xavier Initialization

Learn Before

Deep Learning Weight Initialization
Xavier Initialization Condition

Concept

Xavier Initialization

Xavier initialization, named after its creators Glorot and Bengio (2010), is a standard technique designed to mitigate vanishing and exploding gradients by carefully setting the initial weights of a neural network layer. To balance the variance during both forward and backward propagation, it typically samples weights from a Gaussian distribution with a mean of $0$ and a variance of $\sigma^2 = \frac{2}{n_ extrm{in} + n_ extrm{out}}$ , where $n_ extrm{in}$ and $n_ extrm{out}$ represent the number of inputs and outputs of the layer respectively. While the underlying assumption of linear activations is often violated in practice, this initialization method has proven highly effective.