Concept

Structure of a Transformer Block

The core component of a Transformer model is the Transformer block, also referred to as a layer. Each block consists of two main sub-layers stacked sequentially: a self-attention sub-layer, which processes relationships between tokens in the sequence, and a feed-forward network (FFN) sub-layer for additional computation. These sub-layers can be arranged using different normalization schemes, such as the post-norm architecture.

0

1

Updated 2026-04-19

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences