1Cademy - Linear Causal Attention Formula

Learn Before

Linear Attention

Formula

Linear Causal Attention Formula

The output of standard query-key-value attention, $\mathrm{Att}_{\mathrm{qkv}}$ , can be approximated by linear attention, $\mathrm{Att}_{\mathrm{linear}}$ . This approximation is computed by dividing the product of the transformed query vector, $\mathbf{q}'_i$ , and the accumulated key-value state, $\mu_i$ , by the product of the transformed query and the accumulated key state, $\nu_i$ : $\mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i,\mathbf{K}_{\le i},\mathbf{V}_{\le i}) \approx \mathrm{Att}_{\mathrm{linear}}(\mathbf{q}'_i,\mathbf{K}'_{\le i},\mathbf{V}_{\le i}) = \frac{\mathbf{q}'_{i} \mu_i}{\mathbf{q}'_{i} \nu_i}$