1Cademy - Structure of a Transformer Block

Learn Before

Decoder-Only Transformer as a Language Model

Concept

Structure of a Transformer Block

The core component of a Transformer model is the Transformer block, also referred to as a layer. Each block consists of two main sub-layers stacked sequentially: a self-attention sub-layer, which processes relationships between tokens in the sequence, and a feed-forward network (FFN) sub-layer for additional computation. These sub-layers can be arranged using different normalization schemes, such as the post-norm architecture.

Updated 2026-04-19

Contributors are: