Theory

Variance of Gradients in Backpropagation

During backpropagation through a fully connected layer without nonlinearities, the network faces a variance scaling problem similar to that in forward propagation. Gradients propagating backward from layers closer to the output can exponentially blow up or vanish. By applying the same statistical reasoning used for the forward pass, we find that to keep the variance of these gradients fixed, the weight variance σ2\sigma^2 must satisfy the condition nextrmoutσ2=1n_ extrm{out} \sigma^2 = 1, where nextrmoutn_ extrm{out} is the number of outputs for that specific layer.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L