Learn Before
Chatbot Memory Optimization
The engineer proposes modifying the model to only store and use the keys and values from the most recent 1024 tokens when generating a new response. Evaluate this proposed solution. Explain how it addresses the memory issue and describe one significant potential drawback of this approach.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A large language model is configured to process text by only storing and considering the keys and values of the most recent 512 tokens when calculating attention for each new token. As the model processes a document that grows from 1,000 tokens to 100,000 tokens in length, how will the memory required for this key-value storage be affected?
Chatbot Memory Optimization
Comparing Memory Usage of Attention Mechanisms