1Cademy - Inspecting Learned Batch Normalization Parameters

Learn Before

Example

Inspecting Learned Batch Normalization Parameters

After training a batch-normalized network, the learned scale ( $\boldsymbol{\gamma}$ ) and shift ( $\boldsymbol{\beta}$ ) parameters of each batch normalization layer can be inspected to verify that they have diverged significantly from their initial values of 1 and 0, respectively. For example, after training a batch-normalized LeNet on Fashion-MNIST for 10 epochs, the first batch normalization layer—which normalizes the 6-channel output of the first convolutional layer—exhibits $\boldsymbol{\gamma}$ values in the approximate range of 1.4 to 2.1 and $\boldsymbol{\beta}$ values spanning from roughly -1.4 to 1.3. These non-trivial learned values confirm that the network exploits the affine transformation to recover representational capacity beyond the unit-variance, zero-mean normalization, adapting the output distribution of each channel to values that are most effective for the downstream activation function and subsequent layers.

0

1

Updated 2026-06-17

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related