Definition

Hidden Size in Transformer Models

In Transformer architectures, the hidden size, denoted as dd, specifies the dimensionality of the input and output vectors for each sub-layer. Furthermore, the majority of the internal hidden states generated within these sub-layers are also dd-dimensional vectors. Because it determines the size of these internal representations, dd can generally be interpreted as a measure of the overall width of the network.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related