Formula

Time Complexity of Self-Attention in Autoregressive Generation

The overall time complexity for the self-attention mechanism when generating a sequence of length len with an L-layer Transformer is O(L×len2)O(L \times len^2). This quadratic complexity arises from summing the linear computational cost (O(i)O(i')) for each of the len generation steps. The total complexity is then multiplied by L, as this entire process is repeated for each layer in the Transformer stack.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences