Formula

Xavier Initialization Condition

When initializing network weights, we face a dilemma: to keep variance fixed during forward propagation, we need nextrminσ2=1n_ extrm{in} \sigma^2 = 1, but for backpropagation, we need nextrmoutσ2=1n_ extrm{out} \sigma^2 = 1. It is generally impossible to satisfy both conditions simultaneously unless the number of inputs equals the number of outputs. As a practical compromise, we try to satisfy the average of the two conditions: 12(nextrmin+nextrmout)σ2=1\frac{1}{2} (n_ extrm{in} + n_ extrm{out}) \sigma^2 = 1. This simplifies to the target weight standard deviation of σ=2nextrmin+nextrmout\sigma = \sqrt{\frac{2}{n_ extrm{in} + n_ extrm{out}}}, which forms the mathematical condition for Xavier initialization.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Learn After