1Cademy - Single-Query Attention Computation with Multiplicative Scaling

Learn Before

Attention Output as a Weighted Sum of Values
Value Matrix (V) in Attention

Formula

Single-Query Attention Computation with Multiplicative Scaling

The attention output for a single query vector, $\mathbf{q}_i'$ , is computed based on the key matrix $\mathbf{K}$ and value matrix $\mathbf{V}$ . This formulation calculates attention scores by taking the dot product of the query with the transposed key matrix and scaling the result by multiplying with $\sqrt{d}$ . The Softmax function converts these scores into attention weights, which are then used to produce a weighted sum of the value vectors. The formula is: $Att_{qkv}(\mathbf{q}_i', \mathbf{K}, \mathbf{V}) = \text{Softmax}(\mathbf{q}_i' \mathbf{K}^T \sqrt{d}) \mathbf{V}$