1Cademy - Causal Attention Input Structure

Learn Before

Concept

Causal Attention Input Structure

In a causal or autoregressive attention mechanism, the input for a given position $i$ is composed of the query vector for that specific position, $\mathbf{q}_i$ , along with the key and value matrices that contain information from the beginning of the sequence up to and including position $i$ . These historical matrices are often denoted as $\mathbf{K}_{\le i}$ and $\mathbf{V}_{\le i}$ . For instance, when calculating attention for a token $y_i$ at a position denoted as $i'$ , the query is represented as $\mathbf{q}_{i'}$ , and the corresponding key and value matrices, K and V, encompass all key-value pairs generated up to that point. This structure ensures that the model's output at any step is only influenced by past and present information, adhering to the causal constraint.

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After