1Cademy - Linear Attention Output Calculation

Learn Before

Recurrent Computation of $\mu_i$ and $\nu_i$ in Linear Attention

Formula

Linear Attention Output Calculation

In this variant of linear attention, the final output is calculated by combining the current transformed query vector $\mathbf{q'}_i$ with the accumulated state variables $\mu_i$ and $\nu_i$ . The numerator is the product of the query and the key-value state $\mu_i$ , while the denominator is the product of the query and the key state $\nu_i$ , serving as a normalization term. The formula is: $\text{Att}_{\text{linear}}(\mathbf{q}_i, \mathbf{K}_{\le i}, \mathbf{V}_{\le i}) = \frac{\mathbf{q'}_i \mu_i}{\mathbf{q'}_i \nu_i}$ This approach replaces the standard Softmax operation with simpler matrix-vector products, leading to computational savings.

0

1

Updated 2026-04-22

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After