Formula

Single-Query Attention Computation with Multiplicative Scaling

The attention output for a single query vector, qi\mathbf{q}_i', is computed based on the key matrix K\mathbf{K} and value matrix V\mathbf{V}. This formulation calculates attention scores by taking the dot product of the query with the transposed key matrix and scaling the result by multiplying with d\sqrt{d}. The Softmax function converts these scores into attention weights, which are then used to produce a weighted sum of the value vectors. The formula is: Attqkv(qi,K,V)=Softmax(qiKTd)VAtt_{qkv}(\mathbf{q}_i', \mathbf{K}, \mathbf{V}) = \text{Softmax}(\mathbf{q}_i' \mathbf{K}^T \sqrt{d}) \mathbf{V}

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related