1Cademy - Key Hyperparameters of a Transformer Encoder

Learn Before

Transformer Encoder Stack

Concept

Key Hyperparameters of a Transformer Encoder

The architecture of a Transformer encoder is defined by several essential hyperparameters. These include the vocabulary size ( $|V|$ ) and the embedding size ( $d_e$ ) used for token representations. Additionally, the hidden size ( $d$ ) specifies the input and output dimensionality for both the self-attention and the Feed-Forward Network (FFN) sub-layers. Other crucial hyperparameters are the number of attention heads ( $n_{\mathrm{head}}$ ) for the multi-head self-attention mechanism, the internal hidden layer size of the FFN ( $d_{\textrm{ffn}}$ ), and the model depth ( $L$ ), which indicates the number of stacked layers.