Learn Before
Formula

Variance of Layer Output in Forward Propagation

When analyzing the scale distribution of an output oio_i for a fully connected layer without nonlinearities, the output is computed as oi=j=1nextrminwijxjo_i = \sum_{j=1}^{n_ extrm{in}} w_{ij} x_j. Assuming the inputs xjx_j and weights wijw_{ij} are drawn independently with a mean of 00 and variances of γ2\gamma^2 and σ2\sigma^2 respectively, the expected value E[oi]E[o_i] is 00. We can compute the variance as extrmVar[oi]=E[oi2](E[oi])2=j=1nextrminE[wij2xj2]0=j=1nextrminE[wij2]E[xj2]=nextrminσ2γ2 extrm{Var}[o_i] = E[o_i^2] - (E[o_i])^2 = \sum_{j=1}^{n_ extrm{in}} E[w^2_{ij} x^2_j] - 0 = \sum_{j=1}^{n_ extrm{in}} E[w^2_{ij}] E[x^2_j] = n_ extrm{in} \sigma^2 \gamma^2. Note that the distribution does not have to be Gaussian, but the mean and variance must exist. To keep this variance fixed during forward propagation and prevent it from changing across layers, the initialization must satisfy the condition nextrminσ2=1n_ extrm{in} \sigma^2 = 1.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L