1Cademy - Transformer Block Sub-Layers

Learn Before

Transformer

Concept

Transformer Block Sub-Layers

The primary structure of a Transformer model consists of a stack of Transformer blocks, also referred to as layers. Each individual block is constructed with two stacked sub-layers: one dedicated to self-attention modeling and another for Feed-Forward Network (FFN) modeling. The internal structure of these sub-layers can be implemented using different normalization designs, such as the pre-norm architecture or the post-norm architecture, which is defined mathematically as $\mathrm{output} = \mathrm{LNorm}(F(\mathrm{input}) + \mathrm{input})$ .

Updated 2026-04-19

Contributors are:

Who are from:

State University of New York at Stony Brook

✔️ 2

References

Learn Before

Related

Learn After