Concept

Xavier Initialization

Xavier initialization, named after its creators Glorot and Bengio (2010), is a standard technique designed to mitigate vanishing and exploding gradients by carefully setting the initial weights of a neural network layer. To balance the variance during both forward and backward propagation, it typically samples weights from a Gaussian distribution with a mean of 00 and a variance of σ2=2nextrmin+nextrmout\sigma^2 = \frac{2}{n_ extrm{in} + n_ extrm{out}}, where nextrminn_ extrm{in} and nextrmoutn_ extrm{out} represent the number of inputs and outputs of the layer respectively. While the underlying assumption of linear activations is often violated in practice, this initialization method has proven highly effective.

0

2

Updated 2026-05-06

Tags

Data Science

D2L

Dive into Deep Learning @ D2L