Learn Before
Formula

Layer Normalization

Layer normalization is a technique that standardizes the activations of a deep network by applying the normalization to one observation at a time, rather than across a minibatch. For an nn-dimensional input vector x\mathbf{x}, the layer normalization operation is defined as: LN(x)=xμ^σ^\textrm{LN}(\mathbf{x}) = \frac{\mathbf{x} - \hat{\mu}}{\hat{\sigma}} where the scalar mean μ^\hat{\mu} and the scalar variance σ^2\hat{\sigma}^2 are computed across the features of the single observation: μ^=1ni=1nxiandσ^2=1ni=1n(xiμ^)2+ϵ\hat{\mu} = \frac{1}{n} \sum_{i=1}^n x_i \quad \textrm{and} \quad \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \hat{\mu})^2 + \epsilon A small constant ϵ>0\epsilon > 0 is added to prevent division by zero. Because it operates on a single observation, both the offset and the scaling factor in layer normalization are scalars.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L