1Cademy - Vector Products per Self-Attention Step

Learn Before

Time Complexity of Self-Attention in Autoregressive Generation

Concept

Vector Products per Self-Attention Step

During a single step of standard autoregressive generation, attending a position $i'$ to all previous context positions requires exactly ${}2 i'$ vector products. This total is comprised of $i'$ products needed for the query-key dot product ( $\mathbf{q}_{i'} \mathbf{K}^{\mathrm{T}}$ ), plus an additional $i'$ products to multiply the Softmax-normalized attention scores with the value matrix ( $\mathrm{Softmax}(\frac{\mathbf{q}_{i'} \mathbf{K}^{\mathrm{T}}}{\sqrt{d}}) \mathbf{V}$ ).

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related