Formula

Formula for Single-Head Self-Attention

The formula for single-head self-attention calculates the output for a single query vector qi\mathbf{q}_{i'} based on a set of key-value pairs. The formula is: Attqkv(qi,K,V)=Softmax(qiKTd)V\text{Att}_{\text{qkv}}(\mathbf{q}_{i'}, \mathbf{K}, \mathbf{V}) = \text{Softmax}(\frac{\mathbf{q}_{i'}\mathbf{K}^{\text{T}}}{\sqrt{d}}) \mathbf{V} In this equation, the value matrix V\mathbf{V} is an element of the set of real-numbered matrices with dimensions i×di' \times d, expressed as VRi×d\mathbf{V} \in \mathbb{R}^{i' \times d}. Here, ii' represents the number of key-value pairs in the sequence, and dd is the dimension of each value vector. The overall process involves computing the dot product of the query with all keys, scaling by the square root of the key dimension, applying the Softmax function to obtain attention weights, and finally, computing a weighted sum of the value vectors.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences