Learn Before
Concept

Batch Normalization as Regularization

Batch normalization naturally acts as a form of regularization because it uses noisy estimates of the mean (μ^B\hat{\boldsymbol{\mu}}{\mathcal{B}}) and standard deviation (σ^B\hat{\boldsymbol{\sigma}}{\mathcal{B}}) derived from the current minibatch. This variation injects noise into the optimization process, which often leads to faster training and less overfitting. This regularization effect is most optimal for moderate minibatch sizes (e.g., 5050100100), as larger minibatches regularize less due to more stable estimates, while tiny minibatches destroy useful signal due to excessively high variance. While these regularization and convergence benefits are well established, the original motivation that batch normalization works by reducing internal covariate shift does not appear to be a valid explanation for its success.

0

2

Updated 2026-05-15

Tags

Data Science

D2L

Dive into Deep Learning @ D2L