1Cademy - Generalized Formula for Post-Norm Architecture

Learn Before

Transformer Block Sub-Layers

Formula

Generalized Formula for Post-Norm Architecture

The generalized formula for an operation within a sub-layer of a Transformer block using the post-norm architecture is: $\text{output} = \text{LNorm}(F(\text{input}) + \text{input})$ In this equation, $F$ represents the sub-layer's function (such as self-attention or a feed-forward network), input is the data fed into the sub-layer, and LNorm denotes Layer Normalization. This architecture implements a residual connection where the input is added to the function's output before the normalization step is applied.