Learn Before
Definition

State Variables in Linear Attention (μ_i, ν_i)

In certain linear attention variants, the entire history of key-value pairs up to a position ii is summarized by two state variables: μi\mu_i and νi\nu_i. The state μi\mu_i is the cumulative sum of outer products between transformed key vectors and their corresponding value vectors (j=0ikjTvj\sum_{j=0}^{i} \mathbf{k'}_j^T \mathbf{v}_j). The state νi\nu_i is the cumulative sum of the transformed key vectors (j=0ikjT\sum_{j=0}^{i} \mathbf{k'}_j^T). These states allow the attention mechanism to operate without re-accessing the full history at each step.

Image 0

0

1

Updated 2026-04-22

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences