Definition

Dimensionality of an Attention Head Output

In a multi-head attention mechanism, the output of each individual attention head, denoted as headj\text{head}_j, is a vector. This vector belongs to a real-valued vector space of dimension dhd_h, which is represented by the notation: headjRdh\text{head}_j \in \mathbb{R}^{d_h}

Image 0

0

1

Updated 2026-05-14

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L