Formula

Self-Attention Output Formula for a Single Query

In the Query-Key-Value (QKV) attention mechanism, the output for an individual query vector qi\mathbf{q}_i is determined by calculating a weighted sum of all value vectors in the sequence. For a sequence of length mm, this operation is mathematically defined as: Attqkv(qi,K,V)=j=0m1αi,jvj\mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i,\mathbf{K},\mathbf{V}) = \sum_{j=0}^{m-1} \alpha_{i,j} \mathbf{v}_j Here, αi,j\alpha_{i,j} is the normalized attention weight that quantifies the relationship between the query at position ii and the key at position jj, while vj\mathbf{v}_j represents the value vector at position jj.

0

1

Updated 2026-04-22

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related