An auto-regressive language model is generating a long text, one token at a time. To manage memory, it employs a key-value caching strategy where it only stores the keys and values for the most recent 2048 tokens. How will the memory allocated for this cache change as the model generates the 5000th token and continues beyond it?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Space Complexity of Sliding Window Attention
Optimizing Memory for Long-Document Processing
An auto-regressive language model is generating a long text, one token at a time. To manage memory, it employs a key-value caching strategy where it only stores the keys and values for the most recent 2048 tokens. How will the memory allocated for this cache change as the model generates the 5000th token and continues beyond it?
Comparing KV Cache Memory Growth