1Cademy - In a model that generates text one token at a time, suppose it has already produced a sequence of length `N` and is now calculating the next token (at position `N+1`). Which of the following best identifies the two primary computational operations within the attention mechanism that cause the cost of this single step to scale linearly with the current sequence length `N`?

Learn Before

Computational Cost per Token in Causal Attention

Multiple Choice

In a model that generates text one token at a time, suppose it has already produced a sequence of length N and is now calculating the next token (at position N+1). Which of the following best identifies the two primary computational operations within the attention mechanism that cause the cost of this single step to scale linearly with the current sequence length N?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related