Concept

Momentum in Batch Normalization Moving Averages

In the implementation of batch normalization, a momentum parameter is utilized to govern the exponential moving average aggregation of past mean and variance estimates during training. These aggregated dataset statistics are then relied upon during the model's prediction mode. Although it shares the name, this momentum parameter is a misnomer and has absolutely nothing to do with the momentum term used to accelerate optimization algorithms; it is simply the convention used in high-level APIs to describe the smoothing factor for the tracked moving averages.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L