Causation

Increased Memory Overhead in Chunked Prefilling

A consequence of processing inputs chunk by chunk is the need to maintain the Key-Value (KV) cache of previously processed chunks in memory while handling subsequent ones. This requirement to hold intermediate cache states results in higher memory consumption for chunked prefilling compared to standard prefilling where the cache is built in one go.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences