1Cademy - Simultaneous Token Generation in Batched Decoding

Learn Before

Batching in LLM Inference

Concept

Simultaneous Token Generation in Batched Decoding

During the decoding phase of a batched inference process, a Large Language Model generates tokens simultaneously for all the sequences within the batch. This generation process continues until the token generation for the longest sequence in the batch reaches completion.

Updated 2026-05-05

Contributors are: