1Cademy - Increased Memory Overhead in Chunked Prefilling

Learn Before

Comparison of Processing in Chunked vs. Standard Prefilling

Causation

Increased Memory Overhead in Chunked Prefilling

A consequence of processing inputs chunk by chunk is the need to maintain the Key-Value (KV) cache of previously processed chunks in memory while handling subsequent ones. This requirement to hold intermediate cache states results in higher memory consumption for chunked prefilling compared to standard prefilling where the cache is built in one go.

Updated 2026-05-06

Contributors are: