1Cademy - Layer Normalization in Transformers

Learn Before

Module-Level variant
Architectural Modifications for Trainable LLMs

Concept

Layer Normalization in Transformers

Layer normalization (LN) is a widely-used architectural modification that is critical for stabilizing the training of deep networks like Transformers. It operates by normalizing the inputs across all features for each training example independently. The specific mathematical function used for layer normalization is central to its application. Key areas of research and improvement for LN in transformers include its placement within the architecture, the development of effective substitutes, and the creation of normalization-free models.