1Cademy - Self-Attention Output Formula for a Single Query

Learn Before

General Attention Formula
Attention Weight Formula ( $\alpha_{i,j}$ )

Formula

Self-Attention Output Formula for a Single Query

In the Query-Key-Value (QKV) attention mechanism, the output for an individual query vector $\mathbf{q}_i$ is determined by calculating a weighted sum of all value vectors in the sequence. For a sequence of length $m$ , this operation is mathematically defined as: $\mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i,\mathbf{K},\mathbf{V}) = \sum_{j=0}^{m-1} \alpha_{i,j} \mathbf{v}_j$ Here, $\alpha_{i,j}$ is the normalized attention weight that quantifies the relationship between the query at position $i$ and the key at position $j$ , while $\mathbf{v}_j$ represents the value vector at position $j$ .

Updated 2026-04-22

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related