Learn Before
Theory
Variance of Gradients in Backpropagation
During backpropagation through a fully connected layer without nonlinearities, the network faces a variance scaling problem similar to that in forward propagation. Gradients propagating backward from layers closer to the output can exponentially blow up or vanish. By applying the same statistical reasoning used for the forward pass, we find that to keep the variance of these gradients fixed, the weight variance must satisfy the condition , where is the number of outputs for that specific layer.
0
1
Updated 2026-05-06
Tags
D2L
Dive into Deep Learning @ D2L