Learn Before
Batch Normalization Mathematical Implementation from Scratch
The programmatic mathematical implementation of batch normalization distinguishes between training and prediction modes. During training mode, it calculates the sample mean and variance over the current minibatch along the appropriate dimensions—the feature dimension for fully connected layers or the channel dimension for convolutional layers. It then normalizes the input using these statistics and updates an exponential moving average of the mean and variance. In prediction mode, it bypasses the batch statistics calculation and directly uses the tracked moving averages. Finally, it applies the learnable scale () and shift () parameters to the normalized output.
0
1
Tags
D2L
Dive into Deep Learning @ D2L