Learn Before
Comparing Memory Usage of Attention Mechanisms
A large language model is processing a very long document (sequence length m). Compare the growth of the Key-Value (KV) cache's memory footprint as m increases for two scenarios: (1) the model uses standard attention, and (2) the model uses sliding window attention with a fixed window size m_w. Explain which approach is more scalable for processing extremely long sequences and why.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A large language model is configured to process text by only storing and considering the keys and values of the most recent 512 tokens when calculating attention for each new token. As the model processes a document that grows from 1,000 tokens to 100,000 tokens in length, how will the memory required for this key-value storage be affected?
Chatbot Memory Optimization
Comparing Memory Usage of Attention Mechanisms