1Cademy - Variance of Layer Output in Forward Propagation

Learn Before

Deep Learning Weight Initialization

Formula

Variance of Layer Output in Forward Propagation

When analyzing the scale distribution of an output $o_i$ for a fully connected layer without nonlinearities, the output is computed as $o_i = \sum_{j=1}^{n_ extrm{in}} w_{ij} x_j$ . Assuming the inputs $x_j$ and weights $w_{ij}$ are drawn independently with a mean of $0$ and variances of $\gamma^2$ and $\sigma^2$ respectively, the expected value $E[o_i]$ is $0$ . We can compute the variance as $extrm{Var}[o_i] = E[o_i^2] - (E[o_i])^2 = \sum_{j=1}^{n_ extrm{in}} E[w^2_{ij} x^2_j] - 0 = \sum_{j=1}^{n_ extrm{in}} E[w^2_{ij}] E[x^2_j] = n_ extrm{in} \sigma^2 \gamma^2$ . Note that the distribution does not have to be Gaussian, but the mean and variance must exist. To keep this variance fixed during forward propagation and prevent it from changing across layers, the initialization must satisfy the condition $n_ extrm{in} \sigma^2 = 1$ .

0

1

Updated 2026-05-06

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Variance of Gradients in Backpropagation

Learn Before

Related

Learn After