Learn Before
Formula

Batch Normalization Formula

Batch normalization is applied to individual layers by standardizing the inputs based on the statistics of the current minibatch B\mathcal{B}. For an input xB\mathbf{x} \in \mathcal{B}, the batch normalization BN\textrm{BN} is defined as:

BN(x)=γxμ^Bσ^B+β\textrm{BN}(\mathbf{x}) = \boldsymbol{\gamma} \odot \frac{\mathbf{x} - \hat{\boldsymbol{\mu}}_{\mathcal{B}}}{\hat{\boldsymbol{\sigma}}_{\mathcal{B}}} + \boldsymbol{\beta}

Here, μ^B=1BxBx\hat{\boldsymbol{\mu}}_{\mathcal{B}} = \frac{1}{|\mathcal{B}|} \sum_{\mathbf{x} \in \mathcal{B}} \mathbf{x} is the sample mean, and σ^B2=1BxB(xμ^B)2+ϵ\hat{\boldsymbol{\sigma}}_{\mathcal{B}}^2 = \frac{1}{|\mathcal{B}|} \sum_{\mathbf{x} \in \mathcal{B}} (\mathbf{x} - \hat{\boldsymbol{\mu}}_{\mathcal{B}})^2 + \epsilon is the sample variance with a small constant ϵ>0\epsilon > 0 added for numerical stability to prevent division by zero. The parameters γ\boldsymbol{\gamma} (scale parameter) and β\boldsymbol{\beta} (shift parameter) are learned during training to recover the degrees of freedom lost due to standardization.

0

2

Updated 2026-05-13

Tags

Data Science

D2L

Dive into Deep Learning @ D2L