Learn Before
Inspecting Learned Batch Normalization Parameters
After training a batch-normalized network, the learned scale () and shift () parameters of each batch normalization layer can be inspected to verify that they have diverged significantly from their initial values of and , respectively. For example, after training a batch-normalized LeNet on Fashion-MNIST for epochs, the first batch normalization layer—which normalizes the -channel output of the first convolutional layer—exhibits values in the approximate range of to and values spanning from roughly to . These non-trivial learned values confirm that the network exploits the affine transformation to recover representational capacity beyond the unit-variance, zero-mean normalization, adapting the output distribution of each channel to values that are most effective for the downstream activation function and subsequent layers.
0
1
Tags
D2L
Dive into Deep Learning @ D2L