Formula

Individual Attention Head Formula

Within a multi-head attention mechanism, the output of each distinct attention head, denoted as headj\mathrm{head}_j, is computed by applying the Query-Key-Value (QKV) attention function to a specific sub-space of the model's representation. The operation utilizes the corresponding query, key, and value matrices for that particular head—Q[j]\mathbf{Q}^{[j]}, K[j]\mathbf{K}^{[j]}, and V[j]\mathbf{V}^{[j]}—resulting in the following equation: headj=Attqkv(Q[j],K[j],V[j])\mathrm{head}_j = \mathrm{Att}_{\mathrm{qkv}}(\mathbf{Q}^{[j]}, \mathbf{K}^{[j]}, \mathbf{V}^{[j]})

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences