Learn Before
Concept

Stabilizing Intermediate Layers with Batch Normalization

During the training of a typical deep network, the intermediate variables can take values with widely varying magnitudes across layers, across units, and over time as model parameters are updated. This drift in distribution can hamper the convergence of the network and necessitate compensatory adjustments in learning rates. Batch normalization addresses this problem by adaptively centering and rescaling these intermediate variables using the mean and standard deviation of each minibatch, thereby keeping their distributions more stable throughout training. Although this stabilization effect was originally attributed to reducing internal covariate shift, that explanation has since been challenged and does not appear to be a valid account of why the technique works.

0

1

Updated 2026-05-13

Tags

Data Science

D2L

Dive into Deep Learning @ D2L