Learn Before
Definition

Key Matrix for Causal Attention (K_≤i)

In causal or autoregressive attention mechanisms, the key matrix for a given position ii, denoted as Ki\mathbf{K}_{\le i}, is formed by vertically stacking all key vectors from the beginning of the sequence up to and including position ii. This matrix represents the set of all keys that the query at position ii is allowed to attend to. It is defined as: Ki=[k0ki]\mathbf{K}_{\le i} = \begin{bmatrix} \mathbf{k}_0 \\ \vdots \\ \mathbf{k}_i \end{bmatrix}

0

1

Updated 2026-04-22

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences