Learn Before
Concept
Momentum in Batch Normalization Moving Averages
In the implementation of batch normalization, a momentum parameter is utilized to govern the exponential moving average aggregation of past mean and variance estimates during training. These aggregated dataset statistics are then relied upon during the model's prediction mode. Although it shares the name, this momentum parameter is a misnomer and has absolutely nothing to do with the momentum term used to accelerate optimization algorithms; it is simply the convention used in high-level APIs to describe the smoothing factor for the tracked moving averages.
0
1
Updated 2026-05-13
Tags
D2L
Dive into Deep Learning @ D2L