1Cademy - An autoregressive language model is generating a sequence of tokens one by one. As the length of the generated sequence increases from 10 tokens to 100 tokens, what is the primary impact of the evolving key-value cache on the computation required to generate the *next* token?

Learn Before

Formula for Cache State Evolution during Autoregressive Decoding

Multiple Choice

An autoregressive language model is generating a sequence of tokens one by one. As the length of the generated sequence increases from 10 tokens to 100 tokens, what is the primary impact of the evolving key-value cache on the computation required to generate the next token?

Updated 2025-10-07

Contributors are: