1Cademy - Omission of Bias Terms in LLM Affine Transformations

Learn Before

Architectural Modifications for Trainable LLMs

Concept

Omission of Bias Terms in LLM Affine Transformations

A popular model design in Large Language Models (LLMs) is the removal of bias terms in affine transformations. This architectural choice can be applied to several components, including layer normalization, the transformations of inputs to QKV attention mechanisms, and feed-forward networks (FFNs).

Updated 2026-04-21

Contributors are: