Formula

Attention Output as a Weighted Sum of Values

The output of a self-attention layer for a single query vector, qi\mathbf{q}_i, is computed as a weighted sum of all value vectors, vj\mathbf{v}_j, in the sequence. The attention weights, αi,j\alpha_{i,j}, which are calculated separately, determine the contribution of each value vector to the final output for the query. This relationship is expressed by the formula: Attqkv(qi,K,V)=j=0m1αi,jvj\text{Att}_{\text{qkv}}(\mathbf{q}_i, \mathbf{K}, \mathbf{V}) = \sum_{j=0}^{m-1} \alpha_{i,j} \mathbf{v}_j where mm is the sequence length.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences