Concept

Causal Attention Input Structure

In a causal or autoregressive attention mechanism, the input for a given position ii is composed of the query vector for that specific position, qi\mathbf{q}_i, along with the key and value matrices that contain information from the beginning of the sequence up to and including position ii. These historical matrices are often denoted as Ki\mathbf{K}_{\le i} and Vi\mathbf{V}_{\le i}. For instance, when calculating attention for a token yiy_i at a position denoted as ii', the query is represented as qi\mathbf{q}_{i'}, and the corresponding key and value matrices, K and V, encompass all key-value pairs generated up to that point. This structure ensures that the model's output at any step is only influenced by past and present information, adhering to the causal constraint.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related