Concept

Pre-Norm Architecture in Transformers

In Transformer-based systems, the pre-norm architecture is a specific sub-layer configuration where layer normalization is applied internally within a residual block. Because this approach is remarkably effective at stabilizing the training of deep neural networks, it serves as the underlying structural basis for the majority of modern Large Language Models.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
Learn After