Learn Before
Formula

Generalized Formula for Post-Norm Architecture

The generalized formula for an operation within a sub-layer of a Transformer block using the post-norm architecture is: output=LNorm(F(input)+input)\text{output} = \text{LNorm}(F(\text{input}) + \text{input}) In this equation, FF represents the sub-layer's function (such as self-attention or a feed-forward network), input is the data fed into the sub-layer, and LNorm denotes Layer Normalization. This architecture implements a residual connection where the input is added to the function's output before the normalization step is applied.

Image 0

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models

Related