Comparison

Comparison of Processing in Chunked vs. Standard Prefilling

Standard prefilling processes an entire input sequence in a single forward pass to construct the Key-Value (KV) cache all at once. In contrast, chunked prefilling operates sequentially on smaller segments of the input, requiring a distinct forward pass for each chunk to compute its attention outputs and progressively update the KV cache.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related