Concept

Reduced Prefilling Parallelism in Chunked Prefilling

The chunk-by-chunk approach of chunked prefilling compromises the high degree of parallelism inherent in standard prefilling. Instead of processing the entire sequence in one large, parallel operation, it breaks the task into multiple, smaller forward passes, which diminishes the efficiency gained from full parallel execution.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences