Concept

Inspecting Learned Batch Normalization Parameters

After training a batch-normalized network, the learned scale (γ\boldsymbol{\gamma}) and shift (β\boldsymbol{\beta}) parameters of each batch normalization layer can be inspected to verify that they have diverged significantly from their initial values of 11 and 00, respectively. For example, after training a batch-normalized LeNet on Fashion-MNIST for 1010 epochs, the first batch normalization layer—which normalizes the 66-channel output of the first convolutional layer—exhibits γ\boldsymbol{\gamma} values in the approximate range of 1.41.4 to 2.12.1 and β\boldsymbol{\beta} values spanning from roughly 1.4-1.4 to 1.31.3. These non-trivial learned values confirm that the network exploits the affine transformation to recover representational capacity beyond the unit-variance, zero-mean normalization, adapting the output distribution of each channel to values that are most effective for the downstream activation function and subsequent layers.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L