Definition

Iteration in Continuous Batching

In continuous batching, an iteration represents a distinct step in computation, corresponding to either the full prefilling phase for a given input or a single token's decoding step. For instance, given an input sequence x=x0...xm\mathbf{x}=x_0...x_m and a target output sequence y=y1...yn\mathbf{y}=y_1...y_n, processing requires a total of n+1n+1 iterations. This includes one initial iteration to handle prefilling, followed by nn iterations to generate the output sequence, yielding one token per iteration.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences