Learn Before
Formula

Formula for Post-Normalization in a Transformer Sub-layer

In the post-norm architecture of a Transformer sub-layer, the output is calculated using a specific formula. First, the sub-layer's function, represented by FF (which could be either self-attention or a feed-forward network), is applied to the input. The result, F(input)F(\text{input}), is then added to the original input in a residual connection. Finally, Layer Normalization (LNorm) is applied to this sum. The complete formula is: output=LNorm(F(input)+input)\text{output} = \text{LNorm}(F(\text{input}) + \text{input})

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences