Inspecting Learned Batch Normalization Parameters
After training a batch-normalized network, the learned scale () and shift () parameters of each batch normalization layer can be inspected to verify that they have diverged significantly from their initial values of 1 and 0, respectively. For example, after training a batch-normalized LeNet on Fashion-MNIST for 10 epochs, the first batch normalization layer—which normalizes the 6-channel output of the first convolutional layer—exhibits values in the approximate range of 1.4 to 2.1 and values spanning from roughly -1.4 to 1.3. These non-trivial learned values confirm that the network exploits the affine transformation to recover representational capacity beyond the unit-variance, zero-mean normalization, adapting the output distribution of each channel to values that are most effective for the downstream activation function and subsequent layers.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Performance Advantage of Built-in Batch Normalization Layers
Inspecting Learned Batch Normalization Parameters
Gradient Descent Batch Norm in Deep Learning Implementation
Batch Normalization in Prediction Mode
Batch Normalization in Convolutional Layers
Batch Normalization Mathematical Implementation from Scratch
Momentum in Batch Normalization Moving Averages
Inspecting Learned Batch Normalization Parameters