Learn Before
Formula

Linear Causal Attention Formula

The output of standard query-key-value attention, Attqkv\mathrm{Att}_{\mathrm{qkv}}, can be approximated by linear attention, Attlinear\mathrm{Att}_{\mathrm{linear}}. This approximation is computed by dividing the product of the transformed query vector, qi\mathbf{q}'_i, and the accumulated key-value state, μi\mu_i, by the product of the transformed query and the accumulated key state, νi\nu_i: Attqkv(qi,Ki,Vi)Attlinear(qi,Ki,Vi)=qiμiqiνi\mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i,\mathbf{K}_{\le i},\mathbf{V}_{\le i}) \approx \mathrm{Att}_{\mathrm{linear}}(\mathbf{q}'_i,\mathbf{K}'_{\le i},\mathbf{V}_{\le i}) = \frac{\mathbf{q}'_{i} \mu_i}{\mathbf{q}'_{i} \nu_i}

Image 0

0

1

Updated 2026-04-22

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related