Idea

Claimed Linear Time Complexity of Self-Attention in Autoregressive Generation

An assertion has been made that the time complexity for self-attention in generating a sequence of length len across L layers is linear, specifically O(L×len)O(L \times len). This claim is based on the computational cost of two key products at each generation step: the dot product between the query and key vectors (qKq'K) and the product of the Softmax output with the value vectors.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course