Increased Memory Overhead in Chunked Prefilling
A consequence of processing inputs chunk by chunk is the need to maintain the Key-Value (KV) cache of previously processed chunks in memory while handling subsequent ones. This requirement to hold intermediate cache states results in higher memory consumption for chunked prefilling compared to standard prefilling where the cache is built in one go.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Increased Memory Overhead in Chunked Prefilling
Reduced Prefilling Parallelism in Chunked Prefilling
A large language model is processing a long input sequence to populate its Key-Value (KV) cache before starting token generation. Which statement best analyzes the fundamental difference between processing the entire sequence in a single forward pass versus processing it in sequential segments?
Analysis of KV Cache Population
Forward Pass Calculation for KV Cache Population
Learn After
A system is designed to handle a very long input sequence by processing it in several smaller, sequential segments instead of all at once. This segmented approach can paradoxically lead to a higher peak memory requirement during processing. What is the fundamental reason for this increased memory overhead?
Memory Usage in Segmented Input Processing
Diagnosing Memory Issues in a Language Model System