Concept

Batch Normalization Mathematical Implementation from Scratch

The programmatic mathematical implementation of batch normalization distinguishes between training and prediction modes. During training mode, it calculates the sample mean and variance over the current minibatch along the appropriate dimensions—the feature dimension for fully connected layers or the channel dimension for convolutional layers. It then normalizes the input using these statistics and updates an exponential moving average of the mean and variance. In prediction mode, it bypasses the batch statistics calculation and directly uses the tracked moving averages. Finally, it applies the learnable scale (γ\boldsymbol{\gamma}) and shift (β\boldsymbol{\beta}) parameters to the normalized output.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L