1Cademy - Dimensionality of an Attention Head Output

Learn Before

Individual Attention Head Computation (General Vector Form)

Definition

Dimensionality of an Attention Head Output

In a multi-head attention mechanism, the output of each individual attention head, denoted as $\text{head}_j$ , is a vector. This vector belongs to a real-valued vector space of dimension $d_h$ , which is represented by the notation: $\text{head}_j \in \mathbb{R}^{d_h}$