Comparing KV Cache Memory Growth
An auto-regressive language model is processing an extremely long document. Compare the growth of its Key-Value (KV) cache memory usage over time under two different scenarios: (1) a standard caching mechanism that stores all previous tokens, and (2) a windowed caching mechanism that only stores the most recent 1024 tokens. Explain the fundamental difference in their space complexity as the sequence length increases.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Space Complexity of Sliding Window Attention
Optimizing Memory for Long-Document Processing
An auto-regressive language model is generating a long text, one token at a time. To manage memory, it employs a key-value caching strategy where it only stores the keys and values for the most recent 2048 tokens. How will the memory allocated for this cache change as the model generates the 5000th token and continues beyond it?
Comparing KV Cache Memory Growth