An autoregressive language model is generating a sequence of tokens one by one. As the length of the generated sequence increases from 10 tokens to 100 tokens, what is the primary impact of the evolving key-value cache on the computation required to generate the next token?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive language model is generating a sequence of tokens one by one. As the length of the generated sequence increases from 10 tokens to 100 tokens, what is the primary impact of the evolving key-value cache on the computation required to generate the next token?
An autoregressive language model is given a starting token and generates a three-token sequence: 'A', 'B', 'C'. Arrange the following states of the key-value (KV) cache in the chronological order they occur during this generation process.
KV Cache State during Generation