Learn Before
Concept

Key Hyperparameters of a Transformer Encoder

The architecture of a Transformer encoder is defined by several essential hyperparameters. These include the vocabulary size (V|V|) and the embedding size (ded_e) used for token representations. Additionally, the hidden size (dd) specifies the input and output dimensionality for both the self-attention and the Feed-Forward Network (FFN) sub-layers. Other crucial hyperparameters are the number of attention heads (nheadn_{\mathrm{head}}) for the multi-head self-attention mechanism, the internal hidden layer size of the FFN (dffnd_{\textrm{ffn}}), and the model depth (LL), which indicates the number of stacked layers.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences