Learn Before
Formula

Layer Normalization Formula

A widely adopted form of the layer normalization function calculates the normalized output for a dd-dimensional real-valued vector h\mathbf{h} as follows:

LNorm(h)=αhμσ+ϵ+β\mathrm{LNorm}(\mathbf{h}) = \alpha \cdot \frac{\mathbf{h} - \mathbf{\mu}}{\sigma + \epsilon} + \beta

In this equation, μ\mathbf{\mu} and σ\sigma are the mean and standard deviation of all the entries in the vector h\mathbf{h}. To maintain numerical stability, the term ϵ\epsilon is included. The parameters αRd\alpha \in \mathbb{R}^{d} and βRd\beta \in \mathbb{R}^{d} correspond to the gain and bias terms.

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related