1Cademy - Recurrent Computation of $$\mu_i$$ and $$\nu

Learn Before

Linear Causal Attention Formula
Recurrent Models

Formula

Recurrent Computation of $\mu_i$ and $\nu_i$ in Linear Attention

In this model, the variables $\mu_i$ and $\nu_i$ serve as representations of the sequence history up to position $i$ . They are calculated using recurrent forms, effectively summarizing past data: $\mu_i = \mu_{i-1} + \mathbf{k'}_{i}^{\mathrm{T}} \mathbf{v}_{i}$ and $\nu_i = \nu_{i-1} + \mathbf{k'}_{i}^{\mathrm{T}}$ .

Updated 2026-05-02

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Computational and Memory Efficiency of Linear Attention's Recurrent Method
A sequential model updates two history-representing variables, μ and ν, at each step i using the following rules:

μ_i = μ_{i-1} + k'i^T * v_i ν_i = ν{i-1} + k'_i^T

Consider the update at a single step i. If the input value vector v_i is a zero vector (a vector of all zeros), but the input key vector k'_i is a non-zero vector, what is the outcome of the update from step i-1 to step i?
Recurrent State Update Calculation
Unrolling a Recurrent State Update
Linear Attention Output Calculation

Learn Before

Related

Learn After