Learn Before
Guiding Principles Behind Batch Normalization
Beyond its individual benefits, batch normalization embodies three broader design principles that are conjectured to guide the invention of future normalization layers and training techniques. These principles—regularization through noise injection, acceleration through rescaling, and preprocessing—reframe the known benefits of batch normalization as generalizable architectural motifs. Regularization through noise injection captures how stochastic minibatch estimates of statistics introduce beneficial perturbation. Acceleration through rescaling describes how centering and scaling activations place parameters on comparable scales favorable for optimization. Preprocessing reflects the analogy between normalizing intermediate representations and the well-established practice of standardizing input features. While these principles overlap with the individual advantages described in the benefits of batch normalization, they are distinguished by their forward-looking character: the textbook authors conjecture that recognizing these mechanisms as transferable design patterns may inspire entirely new layers and techniques beyond batch normalization itself.
0
1
Tags
D2L
Dive into Deep Learning @ D2L