1Cademy - Layer Normalization

Learn Before

Batch Normalization

Formula

Layer Normalization

Layer normalization is a technique that standardizes the activations of a deep network by applying the normalization to one observation at a time, rather than across a minibatch. For an $n$ -dimensional input vector $\mathbf{x}$ , the layer normalization operation is defined as: $\textrm{LN}(\mathbf{x}) = \frac{\mathbf{x} - \hat{\mu}}{\hat{\sigma}}$ where the scalar mean $\hat{\mu}$ and the scalar variance $\hat{\sigma}^2$ are computed across the features of the single observation: $\hat{\mu} = \frac{1}{n} \sum_{i=1}^n x_i \quad \textrm{and} \quad \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \hat{\mu})^2 + \epsilon$ A small constant $\epsilon > 0$ is added to prevent division by zero. Because it operates on a single observation, both the offset and the scaling factor in layer normalization are scalars.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related