1Cademy - Key Matrix for Causal Attention (K

Learn Before

Vector Prefix Notation

Definition

Key Matrix for Causal Attention (K_≤i)

In causal or autoregressive attention mechanisms, the key matrix for a given position $i$ , denoted as $\mathbf{K}_{\le i}$ , is formed by vertically stacking all key vectors from the beginning of the sequence up to and including position $i$ . This matrix represents the set of all keys that the query at position $i$ is allowed to attend to. It is defined as: $\mathbf{K}_{\le i} = \begin{bmatrix} \mathbf{k}_0 \\ \vdots \\ \mathbf{k}_i \end{bmatrix}$