Learn Before
Formula
Xavier Initialization Condition
When initializing network weights, we face a dilemma: to keep variance fixed during forward propagation, we need , but for backpropagation, we need . It is generally impossible to satisfy both conditions simultaneously unless the number of inputs equals the number of outputs. As a practical compromise, we try to satisfy the average of the two conditions: . This simplifies to the target weight standard deviation of , which forms the mathematical condition for Xavier initialization.
0
1
Updated 2026-05-06
Tags
D2L
Dive into Deep Learning @ D2L