Learn Before
Calculating Memory Growth for Token Caching
An autoregressive model uses a mechanism to store key and value vectors for previously processed tokens to speed up inference. For a sequence of 100 tokens, this storage mechanism consumes 200 MB of memory. Assuming all other model parameters remain constant, how much memory would this mechanism consume for a sequence of 400 tokens? Explain your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Reducing KV Cache Complexity via Windowed Caching
An engineer is deploying a large autoregressive model for a chatbot. They observe that as a conversation with a user gets longer, the model's memory consumption increases steadily, eventually leading to performance issues. This is because the model stores key and value vectors for every token in the conversation history to speed up the generation of the next token. Based on this mechanism, what is the fundamental relationship between the length of the conversation history (in tokens) and the amount of memory required for this storage?
KV Cache Memory Footprint Comparison
Calculating Memory Growth for Token Caching
Reducing KV Cache Complexity via Head Sharing
Formula for KV Cache Memory Size