Definition

Dimensionality of an Attention Head Output

In a multi-head attention mechanism, the output of each individual attention head, denoted as headj\text{head}_j, is a vector. This vector belongs to a real-valued vector space of dimension dhd_h, which is represented by the notation: headjRdh\text{head}_j \in \mathbb{R}^{d_h}

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences