Learn Before
Concept

Reduction of Covariate Shift via Layer Normalization

Normalizing the outputs of layers in deep neural networks—by subtracting the mean and dividing by the standard deviation—helps to effectively mitigate the covariate shift problem. This reduction in covariate shift is a primary mechanism by which layer normalization improves overall training stability.

0

1

Updated 2026-05-03

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L

Related