Concept

Layer Normalization in Transformers

Layer normalization (LN) is a widely-used architectural modification that is critical for stabilizing the training of deep networks like Transformers. It operates by normalizing the inputs across all features for each training example independently. The specific mathematical function used for layer normalization is central to its application. Key areas of research and improvement for LN in transformers include its placement within the architecture, the development of effective substitutes, and the creation of normalization-free models.

Image 0

0

1

Updated 2026-05-02

Tags

Data Science

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After