Learn Before
An autoregressive language model uses a key-value cache to store contextual information during text generation. A developer decides to double the maximum sequence length that the model can process. Assuming all other architectural parameters (such as the number of layers, number of attention heads, and the dimensionality of each head) remain constant, by what factor will the maximum memory required for the key-value cache change?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive language model uses a key-value cache to store contextual information during text generation. A developer decides to double the maximum sequence length that the model can process. Assuming all other architectural parameters (such as the number of layers, number of attention heads, and the dimensionality of each head) remain constant, by what factor will the maximum memory required for the key-value cache change?
Optimizing KV Cache for a Chatbot Application
KV Cache Memory Calculation