1Cademy - Hidden Size in Transformer Models

Learn Before

Key Hyperparameters of a Transformer Encoder
Considerations in BERT Model Development

Definition

Hidden Size in Transformer Models

In Transformer architectures, the hidden size, denoted as $d$ , specifies the dimensionality of the input and output vectors for each sub-layer. Furthermore, the majority of the internal hidden states generated within these sub-layers are also $d$ -dimensional vectors. Because it determines the size of these internal representations, $d$ can generally be interpreted as a measure of the overall width of the network.