Learn Before
Concept

Transformer Block Sub-Layers

The primary structure of a Transformer model consists of a stack of Transformer blocks, also referred to as layers. Each individual block is constructed with two stacked sub-layers: one dedicated to self-attention modeling and another for Feed-Forward Network (FFN) modeling. The internal structure of these sub-layers can be implemented using different normalization designs, such as the pre-norm architecture or the post-norm architecture, which is defined mathematically as output=LNorm(F(input)+input)\mathrm{output} = \mathrm{LNorm}(F(\mathrm{input}) + \mathrm{input}).

0

1

Updated 2026-04-19

Tags

Transformer

Data Science

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Related
Learn After